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I. REAL PARTY IN INTEREST 

Bio-Rad Laboratories, Inc. is the assignee of the above-referenced patent 
application by an assignment from MJ Bioworks, Inc. and thus, the real party in interest. 

IL RELATED APPEALS AND INTERFERENCES 

There are no related appeals, interferences, or judicial proceedings at this time. 

III. STATUS OF THE CLAIMS 

Claims 1-14, 16, 18, 19, 21, and 31 are cancelled. 

Claims 15, 17, 20, 22-30, and 32-44 are pending. 

No claims are withdrawn from consideration but not cancelled. 

No claims are allowed. 

No claims are objected to. 

Claims 15, 17, 20, 22-30, and 32-44 are rejected. 

Claims 15, 17, 20, 22-30, and 32-44 are being appealed. 

IV. STATUS OF AMENDMENTS 

No amendments after the final Office Action were submitted. Claims 15, 17, 20, 
22-30, and 32-44 on appeal herein are as amended in the Response to the Office Action filed 
February 4, 2008. 

V. SUMMARY OF CLAIMED SUBJECT MATTER 

A. Claim 15-Independent 

The subject matter of independent claim 15 relates to a protein comprising two 
joined heterologous domains: (1) a sequence non-specific double-stranded nucleic acid binding 
domain that comprises an amino acid sequence that has at least 75% sequence identity to SEQ ID 
NO:2; and (2) a DNA polymerase domain; where the presence of the sequence non-specific 
double-stranded nucleic acid binding domain enhances the processivity of the polymerase 
domain compared to an identical protein that does not have the sequence non-specific double- 
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stranded nucleic acid binding domain joined to it. Support for this claim can be found, e.g., on 
page 13, line 32 bridging to page 14, line 13. 

B. Claim 30-Independent 

The subject matter of independent claim 30 relates to a protein comprising two 
joined heterologous domains: a sequence non-specific double-stranded nucleic acid binding 
domain that comprises an amino acid sequence that has at least 75% sequence identity to the 
Sac7d sequence set forth in amino acids 7-71 of SEQ ID NO: 10; and a DNA polymerase 
domain, where the presence of the sequence non-specific double-stranded nucleic acid binding 
domain enhances the processivity of the polymerase domain compared to an identical protein 
that does not have the sequence non-specific double-stranded nucleic acid binding domain joined 
thereto. Support can be found, e.g., in SEQ ID NO: 10 and at page 12, lines 8-9 and page 14, 
lines 5-9. 

C. Claim 43-Dependent 

The subject matter of dependent claim 43 relates to a protein as in claim 15 where 
the sequence non-specific double-stranded nucleic acid binding domain comprises an amino acid 
sequence that has at least 85% sequence identity to SEQ ID NO:2, Support can be found, e.g., 
on page 14, lines 5-9. 

D. Claim 44-Dependent 

The subject matter of dependent claim 44 relates to the protein of claim 30, where 
the sequence non-specific double-stranded nucleic acid binding domain comprises an amino acid 
sequence that has at least 85% sequence identity to the Sac 7d sequence set forth in SEQ ID 
NO:10. Support can be found, e.g., in SEQ ID NO: 10 and at page 12, lines 8-9 and page 14, 
lines 5-9. 
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E. Claims 34-Dependent 

The subject matter of dependent claim 34 relates to the protein of claim 30, where 
the sequence non-specific double-stranded nucleic acid binding domain comprises an amino acid 
sequence that has at least 90% sequence identity to the Sac 7d sequence set forth in SEQ ID 
NO:10. Support can be found, e.g., in SEQ ID NO: 10 and at page 12, lines 8-9 and page 14, 
lines 5-9. 

F. General Summary of Claimed Subject Matter 

The pending claims relate to polymerase proteins that are defined by two 
domains. The first domain is a polymerase domain. The second domain is a nucleic acid 
binding domain that improves the processivity of the polymerase domain. Processivity is the 
ability of the polymerase to remain attached to the template and incorporate nucleotides into a 
second strand of nucleic acid that is being synthesized from the template. 

The polymerase domain is defined by its function. The nucleic acid binding 
domain is defined by its percent identity to a prototype protein, Sso7d or Sac7d, and its ability to 
increase processivity of a polymerase to which it is joined. The claims argued specifically in this 
appeal are drawn to various embodiments in which the sequence non-specific double-stranded 
nucleic acid binding domain comprises at least 75% identity to the Sso7d reference sequence 
(claim 15); at least 75% identity to the Sac7d reference sequence (claim 30), at least 85% 
identity to the Sso7d reference sequence (claim 43), at least 85% identity to the Sac7d reference 
sequence (claim 44) and at least 90% identity to the Sac7d references sequence (claim 34). 

VL GROUNDS OF REJECTION TO BE REVIEWED ON APPEAL 

The rejection of Claims 15, 17, 20, 22-30, and 32-44 under 35 U.S.C. § 1 12, first 
paragraph as not enabled is to be reviewed on appeal. 
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VII. ARGUMENT 

A. Rejection and Examiner's Arguments 

1. General summary of rejection 

There is one rejection in the Final Office Action dated April 7, 2008. Claims 15, 
17, 20, 22-30, and 32-44 are rejected under 35 U.S.C. § 112 for alleged lack of enablement. The 
Examiner's position is that it would require undue experimentation to determine an Sso7d and/or 
Sac7d variant that has at least 75%-90% identity to the reference Sso7d or Sac7d sequence, and 
that retains DNA binding activity and the ability to enhance processivity of a polymerase to 
which it is joined. The Examiner cites three references, which are discussed in detail below, as 
specifically teaching that sequence similarity alone does not necessarily provide a predictable 
correlation between the structure and specific function of a protein. The Examiner contends that 
neither the art nor the specification teach what "other" domains, regions or specific amino acids 
of Sso7d or Sac7d are responsible for sequence non-specific double-stranded DNA binding or 
enhancing processivity of an attached polymerase. 

The Examiner acknowledges that the art, including the references that he cites, 
provides evidence of an association between double-stranded DNA binding activity and the 
ability to increase the processivity of an associated polymerase polypeptide. However, the 
Examiner maintains that this guidance is not specific beyond the fact that this relationship exists 
(see, e.g., page 6, first sentence of the April 7, 2008 Final Office). The Examiner thus also 
disputes that the DNA binding activity of Sso7d is sufficiently connected to the ability to 
enhance processivity such that one of skill can recognize variants that would reasonably be 
expected to enhance processivity based on identification of residues involved in DNA binding in 
the Sso7d and Sac7d structures that are in the prior art. 

2. Summary of rejection as it relates to three references cited by the Examiner in 
alleged support of the enablement rejection 

The Examiner cites three references as evidence that allegedly support the 
rejection. These references are first cited in the Office Action mailed January 4, 2007. The 
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Examiner characterizes each of the references as teaching that a single point mutation of Sso7d 
affects the function of the nucleic acid binding domain and therefore demonstrating the 
unpredictability of determining variants that would have the claimed function. Specifically, a 
post-filing reference, Wang et al Nucl Acids Res. 32:1 197-1207, 2004 (Wang") co-authored by 
the inventor, is characterized by the Examiner as teaching that mutation of Trp24 of Sso7d 
significantly reduces its effectiveness in enhancing processivity. Wang is also cited as allegedly 
teaching that the use of a DNA binding protein with a much higher affinity for double-stranded 
DNA could be detrimental to the catalytic activity of the polymerase and that further studies are 
needed to identify the optimal range of affinities to achieve "the ultimate balance between 
processivity and catalysis" (Final Office Action, page 5). The Examiner describes Shehi et al, 
Biochemistry 42:8362-8368, 2003 ("Shehi"), another post-filing reference, as teaching that an 
Sso7d protein in which Glu53 is deleted could not be isolated and as suggesting that the mutation 
misfolds the protein. The Examiner also characterizes Shehi as teaching that an Sso7d protein 
having a deletion of Leu54 has limited solubility in aqueous solution. Last, Consonni et al, 
Biochemistry 38:12709-12717, 1999 ("Consonni"), is characterized by the Examiner as teaching 
that mutation of F31A and W23A in Sso7d impairs the capacity of the protein to bind dsDNA. 
The Examiner alleges that these various mutations demonstrate the unpredictability of the effect 
of point mutations in Sso7d on any particular function or attribute of Sso7d. 

These references will be individually addressed in section VILC.2.b.3. 
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B. Legal Standards for Enablement 

It is well-settled in the biotechnology art that routine screening of even large 
numbers of samples is not undue experimentation when a probability of success exists. In re 
Wands, 858 F.2d 731, 8 USPQ2d 1400 (Fed. Cir. 1988). As stated in Wands, "enablement is not 
precluded by the necessity for some experimentation, such as routine screening." In re Wands, 
858 F.2d at 737, 8 USPQ2d at 1404 (Fed. Cir. 1988). The fact that experimentation may be 
complex does not render it undue. 

As set forth by the Federal Circuit in Wands {supra, 8 USPQ2d 1400, 1404) 
multiple factors should be considered when determining whether any necessary experimentation 
is undue. These factors include: 

(a) the breadth of the claims; 

(b) the nature of the invention; 

(c) the state of the prior art; 

(d) the level of one of ordinary skill; 

(e) the level of predictability in the art; 

(f) the amount of direction provided by the inventor; 

(g) the existence of working examples; and 

(h) the quantity of experimentation needed to make or use the invention based on 
the content of the disclosure. 

C. Claims 15, 17, 22-30, 32, and 35-44 are enabled. 

The specification provides examples that show that both Sso7d and Sac7d 
increase processivity when joined to polymerases {see, e.g., Figures 1 and 2), and directs the 
practitioner to the large body of art in this field that provides detailed structural insight into the 
interaction of Sso7d and Sac7d with DNA. In addition, a Declaration under 37 C.F.R. § 1.132 
by Dr. Peter Vander Horn (Evidence Appendix A, submitted with Applicants' response filed 
March 2, 2004 and referred to herein as "the Vander Horn Declaration") provides objective 
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reasons, based on the detailed knowledge of Sso7d and Sac7d in the art, justifying the percent 
identities recited in the current claims. 

Further, the prior art provides evidence that one of skill can in fact reasonably 
predict the effects of point mutations in Sso7d on double-stranded DNA binding and hence, on 
the ability of a variant protein to enhance processivity. 

1. Teachings and examples in the specification 

The specification teaches that the Archaeal small basic DNA binding proteins 
Sso7d and Sac7d and variants thereof having the recited percent identities can be used as DNA 
binding domains to enhance polymerase processivity when joined to polymerases. In particular, 
the specification provides reference sequences for the two proteins (SEQ ID NO:2 contains the 
Sso7d sequence, and SEQ ID NO: 10 contains the Sac7d sequence), which were characterized in 
the art prior to Applicants 1 invention, and directs a practitioner to exemplary references 
describing such studies (e.g., Baumann et al Structural Biol 1:808-819, 1994 and Gao et al, 
Nature Struc. Biol 5:782-786, 1998; both cited at page 12, lines 8-15; copies provided as 
Exhibits 9 arid 3, respectively, of the Vander Horn Declaration). In addition, the application 
provides general guidance for determining percent identity using well known methods (see, e.g., 
the section beginning on page 14, line 5 of the specification) and for performing functional 
assays. The functional assays include those that evaluate sequence non-specific double-stranded 
DNA binding activity of a DNA binding domain, and assay to evaluate modified polymerases for 
enhanced processivity (see, e.g., page 28, lines 16-33). 

Furthermore, the specification exemplifies both Sso7d and Sac7d polymerase 
fusion proteins. First, the specification provides data showing that that Sso7d enhances 
processivity of both Taq and Pfu polymerases (see, e.g., page 34). The two polymerase are in 
different polymerase families. Taq is a family A polymerase; Pfu is a family B polymerase (see, 
e.g., Example I, first paragraph on page 31 and page 32, second full paragraph). The 
specification additionally provides data demonstrating that that Sso7d can be joined at either its 
N-terminus or C-terminus to the polymerase (see, e.g., the description of the construction of 
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fusion polymerases that begins on page 32). In Sso7d-7a<7 fusions, Sso7d is joined through its 
C-terminus to the N-terminus of Taq or ATaq. In the Pfu-Sso7d fusion, Sso7d is joined through 
its N-terminus to the C-terminus of Pfu polymerase. These examples thus show that Sso7d, 
modified at either the N-terminus or C-terminus by linkage to the polymerase, can increase the 
processivity of polymerases. 

The specification also provides data demonstrating that Sac7d, which has 82% 
identity to the Sso7d reference sequence SEQ ID NO:2 (see, e.g., the Vander Horn Declaration at 
section 12 beginning on page 7) has the same effect on a polymerase as that observed with 
Sso7d. In Example 4 at page 36, lines 27-30, a Sac7d-ATaq fusion was evaluated in a PCR 
reaction using short primers. The results (Figure 2) show that the Sac7d polymerase fusion was 
very similar to the Sso7d polymerase fusion. 

The specification thus provides ample teachings, including working examples, to 
guide one of ordinary skill in the art in practicing the claimed invention. 

2. State of the art at the time of the invention 

The Sso7d and Sac7d prototype sequences are not novel genes. Applicants are 
not claiming 75% identity to arecently discovered prototype gene. There is an extensive body of 
literature in the art pertaining to the structure of Sso7d and Sac7d. In the Vander Horn 
Declaration, Dr. Vander Horn explains that Sso7d and Sac7d are part of a family of naturally 
occurring Archaeal proteins (referred to herein for convenience as "Sso7" proteins). A natural 
variation of about 76% occurs within the family (see, e.g., section 7 of the Vander Horn 
Declaration, beginning on page 2, which is discussed at greater length below). Further, analyses 
of the structures of Sso7d and Sac7d bound to DNA have been performed by several 
investigators. Dr. Vander Horn illustrates how this structural information is used to select amino 
acid residues for substitution that can reasonably be expected to preserve DNA binding function 
and accordingly, the ability to influence polymerase processivity (e.g., section 10 of the Vander 
Horn Declaration beginning on page 4, as explained below). 
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a. Applicants have provided objective reasons justifying the percent identities set 
forth in the claims based on known sequences. 

Not only does the subject specification provide a full disclosure of the family of 
Sso7 proteins, Applicants have provided the Vander Horn declaration, which provides objective 
reasons justifying the 75% identity level. Dr. Vander Horn explains that by routinely comparing 
the sequence differences between the family members, those of skill would immediately 
recognize where the critical and noncritical regions of the proteins are located. The family 
members are a virtual roadmap to novel variants. Dr. Vander Horn additionally explains how the 
prior art, e.g., Gao et al., provide structure-activity relationships that can be used in determining 
residues that can reasonably be expected to be substituted without compromising activity. 

According to Dr. Vander Horn, a GenBank search of Sso7d readily identifies 17 
naturally occurring DNA binding proteins that have amino acid identities of between 98-79% 
(e.g., section 7 of the Vander Horn Declaration). Indeed, in section 12 of his declaration, Dr. 
Vander Horn explains that based on naturally occurring proteins alone, domains having 79% 
identity to Sso7d or Sac7d are readily available for use in the invention. The second paragraph 
of page 18 of the Declaration further notes that three of the references cited in the specification 
(Choli et al, Baumann et al. 9 and McAfee et al, copies of which are provided as exhibits to the 
Vander Horn Declaration) contain figures with sequence alignments of Sso7d homologues, 
including Sac7d, Sac7a and Sac7e. These proteins are repeatedly described as structurally and 
functionally closely related proteins. Dr. Vander Horn concludes that "[n]o one skilled in the 
arts that reads the patent specification and the referenced papers would have objective reasons to 
think it [the proteins] wouldn't work." 

In section 13 of his Declaration, Dr. Vander Horn illustrates how one of skill can 
readily generate a protein having 76% identity to Sso7d using the natural variation that occurs in 
Sso7 family members as a road map. In addition to the natural variations between family 
members, one of skill in the art readily understands that non-naturally occurring but conserved 
substitutions are possible throughout the primary sequences of the prototype proteins. Dr. 
Vander Horn explains this conventional wisdom at section 9 of his Declaration. 
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Dr. Vander Horn further notes that man-made modifications (muteins) can 
additionally be generated by introducing conservative substitutions at sites selected based on 
structural information (discussed below). Such a procedure can readily generate a protein having 
lower than 60% identity to the reference Sso7d sequence that still would enhance polymerase 
processivity (section 14, beginning on page 8). 

b. Applicants have provided objective reasons justifying the percent identity set 
forth in the claims based on structure of the protein. 

Dr. Vander Horn explains at section 10, beginning on page 4 that the structure of 
Archaeal proteins when complexed with DNA has been previously studied by investigators such 
as Gao et aL Dr Vander Horn details how this information permits a practitioner to identify the 
critical DNA binding regions in the proteins, which allows one of skill to focus mutations away 
from these critical regions. Specifically, Dr. Vander Horn points to unstructured regions of 
Sso7d (first full paragraph on page 5), which are sites where divergences in Sso7 sequences 
occur, that can be targeted for mutations. Dr. Vander Horn also indicates that residues in the 
alpha helix, which do not interact with the DNA substrate, could be targeted for substitution so 
long as they preserved secondary structure (second paragraph of page 5). Furthermore, based on 
the structures, Dr. Vander Horn explains that the differences in composition and length between 
Sso7 and Sac7 cluster in the turns between beta sheets and in amino acids facing away from the 
DNA binding domain in the crystal structure and that these regions are thus also areas of 
plasticity. Finally, various lysine residues that would be reasonably be expected to tolerate 
substitution without compromise to DNA binding are described in paragraphs 4 and 5 on page 5. 

The Vander Horn Declaration thus illustrates how one of skill in the art can use 
the large body of knowledge in the art to identify functional Sso7d and Sac7d variants having the 
percent identity set forth in the claims without undue experimentation. Therefore, in view of the 
guidance provided in the specification, the existence of working examples, the level of skill of 
the ordinary practitioner in the art, and the depth of knowledge in this art, the claims are properly 
enabled over the entire scope. 
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3. The predictably in the art 

The predictability in the art refers to the ability of one skilled in the art to 
extrapolate the disclosed or known results to the claimed invention. What is known in the art 
provides evidence as to the question of predictability. MPEP §216 { 4.03. In the present case, the 
basic techniques used in practicing the invention, e.g., those used in the art of recombinant 
expression and protein analysis, have been in existence for over two decades and have since 
improved dramatically to reach a high level of technical sophistication and predictability. In 
addition, as discussed above, much is known about the structural and functional features of Sso7d 
and Sac7d that allow one of skill to rationally predict the effects of changes in protein sequence. 
Therefore, a significant level of predictability exists in the relevant art. 

The Examiner cited three references in support of the position that it would 
require undue experimentation to make and use Sso7d and Sac7d variants in the invention. The 
references are characterized by the Examiner as showing that a single point mutation in Ssod can 
affect the function of the nucleic acid binding domain and therefore demonstrating that the 
effects of mutating Sso7d can not be reasonably expected to be predictable by one of ordinary 
skill in this art. However, the references in fact support the enablement of the claims. Applicants 
have submitted a Declaration under 37 C.F.R. § 1.132 by Dr. Yan Wang (the Wang Declaration) 
that is provided in the accompanying appendix, which explains that the experiments performed in 
the cited publications provide evidence that shows that one of skill can reasonably be expected to 
be able to use the extensive structural Sso7d/Sac7d data available in the art to predict the effects 
of sequence changes on Sso7d (or Sac7d) DNA binding activity. The Wang Declaration also 
explains that the cited art additionally shows that DNA binding activity correlates with the ability 
to modulate processivity of a polymerase to which the Sso7d/Sac7d protein is joined. 

In the cited references the authors were seeking to investigate Sso7d by 
introducing mutations that were predicted, based on the structure, to negatively affect function. 
Dr. Wang illustrates how their results validate this approach of using the structure to predict 
effects on function. In the current invention, the skilled artisan can use this same structural 
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information to reasonably predict sequence changes that preserve Sso7 function, rather than 
destroy it. Each of the references is individually discussed below. 

Wang 

The Examiner points to Wang as further supporting the rejection because Wang 
teaches that a change in Trp24 of Sso7d significantly reduces the effectiveness of the protein in 
enhancing processivity. Wang is a post-filing publication of the current inventor's work relating 
to polymerases that are modified by linkage to an Sso7d protein. In the Wang Declaration, Dr. 
Wang explains that in one aspect of the experiments presented in the article, it was determined 
that Sso7d double-stranded DNA (dsDNA) binding activity is important for processivity, as 
taught in the current application. As Applicant has previously noted, the interactions between 
Sso7d and dsDNA have been extensively studied. Dr. Wang explains that for the experiments 
described in her publication, the Wang reference, Trp 24 was identified in structural studies to be 
important for binding to dsDNA, as explained on page 1201, column 1 in the last paragraph. 
(Trp24 in Wang corresponds to Trp23 in SEQ ID NO:2 of the application as filed.) The 
referenced structural studies (Gao et al. y Nature Struct. Biol 5:782-786, 1998; and Catanzano, et 
al Biochemistry 37:10493-10498, 1998) were readily available in the art before the current 
invention. In the Declaration, Dr. Wang further explains that she purposefully selected Trp 24 
for mutation in the studies described in the Wang publication to further investigate the correlation 
between DNA binding and processivity. Three mutant Sso7d-polymerase fusion proteins in 
which Trp 24 was replaced with Val, Gly or Glu were created with the intent of reducing the 
ability of Sso7d to bind dsDNA and in turn, reducing its ability to enhance the processivity of the 
DNA polymerase. Dr. Wang points out that all three mutant fusion proteins exhibited decreased 
processivity relative to that of the wildtype Sso7d-polymerase fusion, just as they had expected. 
Substitution of Trp 24 with Glu, which had been expected to exhibit the greatest effect because it 
differs the most from the wild-type residue, also resulted in the greatest decrease in processivity. 

The experiments presented in the Wang publication therefore illustrate how one of 
skill in the art makes use of available structural information to recognize amino acid residues that 
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are expected to be relevant to function. Wang and her co-authors intentionally selected a residue 
that is integral to DNA binding activity based on available Sso7d structural data, fully expecting 
to compromise the function of Sso7d in enhancing polymerase processivity. This is precisely 
what was observed. Therefore, one of skill can in fact make predictions based on structural 
information that have the desired effect on function. The same structural information can be used 
to select residues that would not be expected to alter Sso7d activity. 

Consonni 

Consonni is cited by the Examiner as providing evidence that the claims are not 
enabled because a single amino acid change (at Trp 23 or Phe 31) in Sso7d can alter function. 
However, Consonni also provides another example of how structural information is predictive of 
the functional importance of particular amino acid residues. In the Wang Declaration, Dr. Wang 
explains that Consonni describes the solution structure of an Sso7d mutant protein F31A, in 
which an alanine is substituted for a phenylalanine residue at position 3 1 . In prior studies cited in 
Consonni at page 12710 in the second full paragraph of the first column, Phe 31 was selected for 
mutation on the basis of structural data that indicated that this residue is located at the core of the 
aromatic cluster and has tight contact with side chains of several residues in the cluster. This 
residue was therefore predicted to be important for Sso7d stability. 

Dr. Wang further notes that this residue is also highly conserved in Sso7 family 
members, as can be seen in a sequence comparison of Sso7d, Sac7d, Sac7a, and Sac7e (see, the 
Vander Horn Declaration). As the authors expected, the mutation of Phe 31 to Ala led to a loss 
in thermo and piezostabilities (third paragraph of column 1, page 12710). The analysis presented 
in the Consonni paper cited by the Examiner relates to the solution structure of the F31 A 
mutation, which was performed in order to determine the structural changes that were associated 
with the loss of stability of the mutant protein. 

Dr. Wang points out that Consonni observed that in the solution structure of the 
F3 1 A mutant, the Trp 23 residue was reoriented such that it pointed inside the aromatic cluster. 
Because of the previously identified role of Trp23 in contacting DNA (Trp 23 is the same residue 
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as Trp24 in Wang), Consonni investigated the DNA-binding activity of the mutant F31A protein. 
The results showed that the binding activity was also impaired, once more highlighting that Trp 
23 plays an important role in DNA binding, as indicated by the structure. 

With regard to the loss of stability observed in the F31A mutant protein, Dr. Wang 
indicates that it is not surprising that the mutation affected Sso7d stability. As explained in the 
Wang Declaration, it is well known in the field that an amino acid with a large, buried 
hydrophobic side chain stabilizes conformation. Accordingly, it is predictable that changing the 
large hydrophobic side chain to a small side chain would result in a loss of stability. In designing 
mutations that are expected to preserve function, Dr. Wang further notes that it is standard 
practice in the art to avoid radically mutating such residues, if it is desired to preserve function, 
just as it would be desirable to avoid mutating those residues that directly contact DNA to 
preserve DNA binding function. 

Shehi 

Shehi investigated the function of the C-terminus of Sso7d. Shehi created an 
Sso7d protein that was truncated at Leu54 (L54A) in order to investigate the role of the C- 
terminal a-helix on stability and DNA binding activity. Dr. Wang notes that this region does not 
contact the DNA in the structural studies of Sso7d and Sac7d DNA binding interactions. Dr. 
Wang then explains that to determine whether deletion of the C-terminal region had effected 
DNA binding, the authors analyzed the binding of L54A to double-stranded calf thymus DNA in 
comparison to the binding activity of wildtype Sso7d. Dr. Wang points out in her Declaration 
that the association constant for binding of L54A to double stranded DNA was similar to that of 
Sso7d, thus showing that deletion of the eight residues at the C-terminus of Sso7d did not result 
in loss of DNA binding activity, which was predictable based on the structure. 

The authors also observed that a variant that was truncated at Glu 53 could not be 
isolated under the same conditions that allowed them to isolate L54A and noted that this 
highlights the role that Leu 54 plays in the folding process. Shehi observes that Baumann and 
colleagues {Nat Struc. Biol 1 : 808-809, 1994) in fact described that the side chain of Leu54 is 
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packed well against that of Ala50, anchoring the C-terminal end of the chain to the protein core. 
Other investigators also confirmed that Leu54 is involved in strong van der Waals interactions 
with the remaining part of the protein. Thus, as Dr. Wang indicates, the available Sso7d/Sac7d 
structural data provided information on the role of Leu 54 that was borne out by Shehi's studies. 

Dr. Wang further explains that Shehi's results are consistent with the analysis of 
Sso7d structure provided by Vander Horn Declaration that is of record in this application. Dr. 
Wang points out that Dr. Vander Horn has indicated that in the context of DNA binding activity, 
the alpha helix is highly mutable, as evidenced by the fact that natural variation of Sso7 
homologs is observed in this domain. Dr. Vander Horn cautioned, however, that the naturally 
occurring mutations in this domain appear to preserve the alpha helix. Thus, in designing Sso7d 
variants for use in the invention, one of skill would introduce mutations that preserved structure. 
Dr. Wang further notes that the L54 residue is also conserved across the naturally occurring 
Sso7 proteins, which also would be an additional consideration in designing variants. 

Shehi mentions that there were difficulties in isolating the deletion in which the 
C-terminus was truncated at Glu53 under the same conditions that were used to isolate L54A. 
Shehi also noted that L54A has a limited solubility in aqueous solution. The Examiner contends 
that "both mutations demonstrate the unpredictability of the effect of point mutations in Sso7d on 
any particular function or attribute of Sso7d." However, Dr. Wang explains that one of skill 
cannot conclude from the experiments in Shehi that the effects of point mutations at Glu53 or 
L54 would be unpredictable. Dr. Wang first notes that Shehi investigated deletion mutations, not 
point mutations, and that the effects observed in deleting most of the C-terminal a-helix cannot 
be extrapolated to the effects of introducing point mutations into that region. 

In terms of the limited solubility of L54A, the authors believe that this is likely 
due to the loss of three net charges and the exposure of hydrophobic moieties upon deleting the 
last eight residues. Dr. Wang points out that it is recognized in the art that changing the charge 
of a protein and exposing hydrophobic residues can influence solubility. She indicates that a 
practitioner in this art can additionally consider such effects in designing variant Sso7d 
sequences. Last, she notes that Shehi was examining L54A alone, not when fused to a 



Page 17 of 114 



Appl. No. 09/870,353 

Appeal Brief, dated September 23, 2008 



PATENT 



polymerase protein. This is relevant because, the limited solubility observed by Shehi under 
these conditions would not necessarily reflect the solubility when the protein is fused to a 
polymerase. In view of the foregoing, the studies in Shehi do not provide evidence that the 
current claims are not properly enabled. 

As illustrated in the Wang Declaration the three references cited by the Examiner 
demonstrate that prior art structural information about Sso7d/Sac7d provides a sound basis for 
rationally predicting of effects of mutations in Sso7d and Sac7d on DNA binding function and 
the ability of the protein to increase processivity of a polymerase to which it is joined. 

4. Quantity of experimentation 

Appellants do not dispute that some experimentation may be necessary to practice 
the present invention as defined by the pending claims. Yet "the test [of undue experimentation] 
is not merely quantitative, since a considerable amount of experimentation is permissible, if it is 
merely routine, or if the specification in question provides a reasonable amount of guidance with 
respect to the direction in which the experimentation should proceed." In re Wands, 8 USPQ2d 
1400, 1404 (Fed. Cir. 1988) (citing In reAngstadt and Griffen, 190 USPQ 214, 217-19, (CCPA 
1976)). 

As MPEP §2164.01 states, complex experimentation is not necessarily undue, if 
the art typically engages in such experimentation. Because any necessary experimentation for 
practicing the claimed invention in the instant case would be routine for an ordinarily skilled 
artisan who is familiar with the well established techniques of protein analysis and molecular 
biology, such experimentation does not constitute undue experimentation. 

In order to enable a generic claim, Applicants need not enable every conceivable 
species, but only provide guidance sufficient that that one of skill could reasonably expect that 
mutations could be introduced that would have predictable effects (In re Angstadt 190 USPQ 214 
(CCPA 1976). The art cited by the Examiner provides additional evidence that one of skill 
could reasonably expect that the structural information available for Sso7d and Sac7d can be 
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used as a basis for reasonably predicting whether a substitution would affect DNA binding and 
processivity. 

The Examiner argues that there is insufficient in-depth explanation in the 
specification as to exactly how DNA binding and processivity effects are related. Wang is also 
cited in the April 7, 2008 final Office Action as teaching that the use of a DNA binding protein 
with a much higher affinity for double-stranded DNA could be detrimental to the catalytic 
activity of the polymerase and that further studies are needed to identify the optimal range of 
affinities to achieve "the ultimate balance between processivity and catalysis". However, the 
claims do not require that the polymerases have the ultimate balance between processivity and 
catalysis. Optimization is not a requirement for patentability. The evidence as a whole suggests 
that DNA binding activity and ability to enhance polymerase processivity are sufficiently 
connected such that the prior art structures of Sso7d and Sac7d complexed to DNA allow one of 
skill to reasonably predict amino acid residues that can be substituted without comprising the 
ability to increase processivity. One of skill can therefore practice the invention without undue 
experimentation. 

D. Claims 43 and 44 are additionally enabled. 

Claims 43 and 44 are also argued independently. 

Claims 43 and 44 relate to a modified polymerase that has a sequence-non- 
specific double-stranded nucleic acid binding domain that comprises an amino acid sequence that 
has at least 85% identity to SEQ ID NO:2 or to the Sac7d sequence of SEQ ID NO: 10. Claims 
43 and 44 are enabled for the reasons explained above and for additional reasons. As noted 
above, the examples in the specification show that both Sac7d and Sso7d work in the claimed 
invention. These two proteins, relative to one another, are two of the most divergent members of 
the naturally occurring family members (see, e.g., section 7 of the Vander Horn Declaration). If 
claims reciting at least 75% identity to the reference sequence, which encompasses all 18 of the 
naturally occurring Sso7d and Sac7d-related proteins identified by Dr. Vander Horn in his 
search, are not deemed to be enabled by the specification despite the facts detailed above, then it 
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is submitted that claims directed to at least 85% identity should be allowable. Such proteins 
would be more closely related than the most divergent members. For example, with Sso7d there 
are 12 residues of the 63 residues in which natural variation are known. The limit of 85% 
identity would encompass variants that have less than the full range of variation, but still allow 
most changes that could be introduced into an Sso7d sequence based on the naturally occurring 
variation. For the reasons explained in the Vander Horn Declaration, such changes would 
reasonably be expected to retain function, as the naturally occurring family members have the 
same function. The same reasoning applies to proteins having at least 85% identity to Sac7d. 
Accordingly, claims drawn to protein domains having at least 85% identity to Sso7d or Sac7d are 
additionally enabled. 

E. Claim 34 is additionally enabled. 

Claim 34 is additionally argued independently. 

Claim 34 relates to a modified polymerase that has a sequence-non-specific 
double-stranded nucleic acid binding domain that comprises an amino acid sequence that has at 
least 90% sequence identity to the Sac 7d sequence set forth in SEQ ID NO: 10. Claim 34 is 
enabled for all of the reasons explained above and for additional reasons. Sac7d variants having 
at least 90% identity to the reference sequence are largely unchanged in protein sequence relative 
to the reference sequence. In view of the knowledge in the art and as evidenced by the Vander 
Horn Declaration, one of skill could reasonably be expected to generate variants having such 
minor changes that would be expected to retain DNA binding activity and hence, the ability to 
enhance processivity. Furthermore, claims relating to Sso7d-polymerases in which the Sso7d 
domain has at least 90% identity to the reference Sso7d sequence were deemed patentable by the 
Patent Office (see, parent application, now U.S. Patent No. 6,627,424.) The same facts 
supporting the patentability of those claims would logically apply to claim 34 of the current 
application. 
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F. Legal precedent supports the allowing claims of the scope presently 

pending. 

Beyond the objective evidence provided above, legal precedent supports the 
Examiner allowing claims of the scope presently pending. In the current invention, Applicant is 
fusing two known p rotein families. The inventive principle is improving the processivity of 
polymerases by fusing them with an Archaeal DNA binding domain. The inventive principle is 
not a polymerase, nor is it an Archaeal DNA binding protein. There is a body of case law that 
focuses on the importance of inventive principle in considering adequacy of support in the 
specification for broad claims. Three cases are particularly illustrative. 

]n In re Fuetterer, 319 F.2d 259, 138 USPQ 217 (CCPA 1963), the applicant had 
discovered that the addition of a protein with an "inorganic salt" to the materials used to make 
tire tread increased the stopping ability of tires made from the materials. The examiner argued 
that the recitation of "inorganic salts" rendered the claims too broad because the amount of 
experimentation required to successfully use undisclosed inorganic salts was undue and required 
the application to restrict the claims to the disclosed salts. The CCPA reversed the breadth 
rejection, explaining that this invention was the combination of inorganic salts with the other 
elements of the claims. The fact that novel inorganic salts might be later developed did not 
preclude broad claims to the inventive combination. 

Application ofHerschler, 200 USPQ 71 1 (CCPA 1979) is additionally instructive 
in clarifying enablement requirements regarding claims reciting old elements Although the 
decision in this case is in the context of written description, the same analysis with respect to the 
issue of inventive principle applies to the enablement rejection raised by the Examiner against 
the present claims. 

In Herschler, the applicant had discovered that dimethylsulfoxide (DMSO) was 
useful as a transdermal carrier for physiologically active steroids. The CCPA found that a 
priority application describing a single steroid (dexamethasone 21 -phosphate) supported a claim 
to the genus of all steroids. The CCPA explained that Herschler's claims were not drawn to a 
novel steroid but to a method of administering steroids. As long as the class of steroids could be 
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expected to be carried across the skin by DMSO, the claim could encompass any steroid, known 
or unknown. Following earlier case law, the CCPA reminded the Patent Office that the 
"inventive principle" was directed to a method of administration of steroids and that the specific 
steroid exemplified was not the point of patentability. 

Herschler provides guidance in identifying the inventive principle and its effect 
on questions of written description and enablement. There the court stated: 

The, solicitor urges that the class of steroids is so large 
that a single example in the specification could not describe the 
varied members with their further varied properties. We disagree 
with this contention. Steroids, when considered as drugs, have a 
broad scope of physiological activity. On the other hand, 
steroids, when considered as a class of compounds carried through 
a layer of skin by DMSO, appear on this record to be chemically 
quite similar. (Herschler, at 717) 

The CCPA is saying that the PTO mistakenly focused its concern on the claim element 
"steroids." Logically following that error, the PTO then argued that all steroids were not yet 
known and therefore any claim embracing the entire genus was not properly supported. This was 
an irrelevant truth because the initial premise was in error: the inventive element was not 
steroids; but their use in combination with a transdermal carrier. 

In re Lange, 209 USPQ 288 (CCPA 1981) further emphasizes the importance of 
inventive principle. In Lange, the invention related to a circuit breaker that quenches an electric 
arc produced between electrodes by use of an electronegative gas. The PTO argued that the 
application only taught how to coat electrodes with the gases and not how to forge them with the 
gases. This was true, but the court recognized that the invention was not how to make electrodes 
but the discovery that the use of the gases would prevent arcing ("the method of forming the 
electrodes is not the inventive principle. "Lange, at p. 295). The court further stated that: 

Although appellant can be required to limit his claims to that 
subject area that is adequately disclosed, existence of species that are 
not adequately disclosed does not require that entire application be 
found nonenabling; this is especially true in case in which inadequately 
disclosed method is not inventive principle. (Lange, at 289} . 

Thus, the inventive principle in Fuetterer was the "use" of inorganic salts with the 
other elements of the claims; in Herschler it was the "use" of DMSO to transdermally transport 



Page 22 of 114 



Appl. No. 09/870,353 

Appeal Brief, dated September 23, 2008 



PATENT 



all steroids; and in Lange the inventive principle was the "use" of gases to prevent arcing. In a 
parallel fashion, the instant invention concerns the "use" of an Archaeal sequence non-specific 
double-stranded nucleic acid binding protein to improve processivity of a polymerase. 

The Examiner makes much argument about the failure of the specification to 
discuss all Archaeal Sso7d proteins. The fact that not all Archaeal binding proteins are known is 
an irrelevant truth because that degree of enablement is not required to allow a claim that does 
not rely on that element for its patentability. One of skill would understand that many DNA 
binding proteins from Archaeons, as a genus, are capable of binding double-stranded DNA 
nonspecifically. And, if provided with a novel protein, one of skill could easily determine, with 
no undue experimentation, whether or not the novel protein binds nonspecifically to nucleic acid. 

The case law cited by the Examiner does not properly support the Examiner's position. 

The Examiner relied on In re Fisher, 166 USPQ18 (CCPA 1970) to support his 
position. Fisher, however, is not applicable to the facts presented here. In Fisher, the invention 
was a hormone, ACTH, that has 39 amino acids. The inventors determined that the first 24 
residues of ACTH are conserved across several animals. The rejected claims read on any ACTH 
protein in which the first 24 amino acid residues were the conserved sequence and that sequence 
was specifically recited in the claim. While such a claim may not be a problematic claim today, 
in 1970 it was not technically possible to make ACTH chemically and all the natural known 
species had 39 amino acids. Because there was no way to make an ACTH of other than 39 • 
amino acids in length, the claim was properly rejected by the CCPA as non-enabled. As the 
court said: 

We have already discussed, with respect to the parent 
application, the lack of teaching of how to obtain other-than-3 9 
amino acid ACTHs . That discussion is fully applicable to the 
instant application, and we think the board was correct in 
finding insufficient disclosure due to this broad aspect of the 
claims (In re Fisher, at 23) 

Procedurally, the rejection of a claim to a protein reciting a "signature sequence" is no longer an 
issue because of advances in protein chemistry. There was nothing inherently wrong with the 
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Fisher claim structure — it was simply written before technology could enable it. That is not true 
in our situation. Following natural variations as a road map and applying routine mutagenesis 
techniques, those of skill can routinely create variations of Sso7d and Sac7d that are at least 75% 
identical to each other or greater. 

Recent Board decision supports allowing the claims. 

Appellants request that the Patent Office take note of the Board's recent decision 
in Ex parte Yuejin Sun et al. (unpublished decision, Appeal No. 2003-1993, Bd. Pat. App. Int., 
Jan. 20, 2004). Although the Sun decision was unpublished, the facts are so similar to 
Applicants' circumstances that the opinion is powerfully persuasive in favor of allowing the 
pending claims. In Sun, the invention was a novel plant gene encoding a protein called 'weel '. 
The claims at issue claimed a nucleic acid having "at least 80% identity to the entire coding 
region of SEQ ID No: 1 ." The examiner had applied both a description and enablement 
rejection. The Board of Appeals reversed both the description and enablement rejections. 

To support the enablement rejection, the examiner in Sun employed the same 
arguments presented in the Final Office Action. Those arguments were: (i) there was no 
structure activity relationship; (ii) there were no predictable means taught for modifying the 
prototype coding region to 80% identity while retaining activity; and, (iii) there were 
insufficient examples. Although not cited by name, the Board reversed the rejections applying 
the principle set forth in In re Angstadt and Griffen, 190 USPQ 214 (CCPA 1976). In Angstad, 
the CCPA ruled that claims that embraced some non-working embodiments were permitted 
under §1 12 so long as a functional assay was provided that allowed those of skill to routinely 
avoid non-working embodiments. In Sun, the Board recognized that the appealed claim was 
enabled by the disclosure of a functional assay to routinely determine when you had proteins that 
functioned and by the fact that modifications to the primary amino acid sequence of wee 1 were 
also routine. 

In comparison to Sun, the facts of the instant case are even more compelling 
towards claim allowance. In Sun, the gene was novel and was the invention per se. In the 
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instant application, the recited gene family is a claim element that is both well known and well 
characterized. Applicants have provided objective evidence that the claim limitation of 75% to 
85% identity to Sso7d and 75% to 90% identity to Sac7d is a reasonable approximation of the 
ability of protein chemists to alter the primary sequence of the prototype while maintaining 
biological function. 

Further, the facts in Sun can be distinguished from another recent Board Decision 
Ex parte Cortese et al (unpublished decision, Appeal No. 2008-0763, Bd. Pat. App. Int., Jan. 24, 
2008). In Cortese, the claims at issue related to methods of screening for compounds that inhibit 
SR-B1 activity by using an assay that employs an SR-B1 polypeptide that is at least 95% similar 
to a human SR-BI reference sequence and an HCV E2 polypeptide to which the SR-B1 
polypeptide binds. The Board upheld the Examiner's rejection of the claims as lacking proper 
written description and as not enabled. 

Unlike Sun, the specification did not identify the regions or regions in the 509 
amino acid SR-BI protein that are involved in HCV E2 binding. Nor did the prior art. 
Furthermore, a mouse SR-BI homolog that had 80% homology to the human sequence did not 
have the HCV E2 binding activity. The Board found that the genus of SR-BI proteins that are at 
least 95% similar to SR-BI and that bind to HCV E2 was not properly described in the absence 
of knowledge of which structural elements of SR-BI are involved in HCV E2 binding. 

The Board also affirmed the enablement rejection. Although the Board again 
noted that there was no description of the HCV E2 binding region of SR-BI in upholding the 
rejection, the Board's decision largely focused on the breadth of the SR-BI activities 
encompassed by the claims. The Board noted that the claims encompassed inhibition of 
activities of SR-BI variants other than the HCV E2 binding activity. The Board found that the 
prior art cited by the Examiner demonstrated that compounds that inhibit one SR-BI activity 
would not necessarily have any effect on any other SR-BI activity. The Board described this as 
the essence of unpredictability and affirmed the rejection for lack of enablement. 

Here, the facts are different. The specification provides working examples and 
points the practitioner to Sso7d/Sac7d structure data that teach residues involved in DNA 
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binding. Further, as explained in the Vander Horn Declaration one of skill can reasonably 
predict functional effects from the protein structure. This is borne out by the studies described in 
the three references cited by the Examiner. Last, there is a sufficient correlation between double- 
stranded DNA binding activity and the ability to increase the processivity of a polymerase such 
that one of skill can reasonably predict which Sso7d and Sac7d variant proteins will enhance 
processivity based on their DNA binding activity. 

Conclusion 

Policy Considerations 

During prosecution, the claims in the context of the decision in Ex parte Sun were 
discussed with the Examiner and Supervising Examiner. The decision in Sun related to a novel 
protein and 80% identity was considered to be enabled. Here, the claims are not drawn to a novel 
protein and neither 75%, 85%, nor 90% identity is considered by the Examiner to be enabled. * 
Applicants respectfully request that the decision for this appeal be considered for possible 
publication in order to provide guidance and clarify patent office policy on protein claims that 
are cast in terms of percent identity. 

For all of the above reasons, the claims are compliant with the standards for . 
enablement. It is respectfully requested that the outstanding rejection be reversed. 

Please deduct the requisite fee pursuant to 37 CFR §41. 20(b)(2) from deposit 
account 20-1430 and any additional fees associated with this Brief. / 

^Kesoectfully submitteaL l\J 

Reg. No]44,879 I ) 

TOWNSEND and TOWNSEND and CREW LLP V 
Two Embarcadero Center, Eighth Floor 
San Francisco, California 941 1 1-3834 
Tel: 415-576-0200 
Fax: 415-576-0300 

161337655 vl 
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VIIL CLAIMS APPENDIX 

1.-14. (cancelled) 

1 5 . (previously presented) A protein comprising two j oined heterologous 

domains: 

a sequence non-specific double-stranded nucleic acid binding domain that 

comprises an amino acid sequence that has at least 75% sequence identity 

to SEQ ID NO:2; and 

a DNA polymerase domain 

wherein the presence of the sequence non-specific double-stranded nucleic acid 
binding domain enhances the processivity of the polymerase domain compared to an identical 
protein that does not have the sequence non-specific double-stranded nucleic acid binding 
domain j oined thereto . 



16. (cancelled) 

1 7. (previously presented) The protein of claim 15, wherein the sequence non- 
specific double-stranded nucleic acid binding domain and the DNA polymerase domain are 
covalently linked. 



18.-21. (cancelled). 
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22. (previously presented) The protein of claim 15, wherein the DNA 
polymerase domain has thermally stable polymerase activity. 



23. (previously presented) The protein of claim 15, wherein the DNA 
polymerase domain comprises a family A polymerase domain. 

24. (previously presented) The protein of claim 23, wherein the family A 
polymerase domain is a Thermus polymerase domain. 

25. (previously presented) The protein of claim 23, wherein the family A 
polymerase domain polymerase domain is a Tag polymerase domain. 



26. (previously presented) The protein of claim 22, wherein the DNA 
polymerase domain is a ATaq domain. 

27. (previously presented) The protein of claim 1 5, wherein the polymerase 
domain is a family B polymerase domain. 

28. (previously presented) The protein of claim 27, wherein the family B 
polymerase domain is a Pyrococcus DNA polymerase I domain. 

29. (previously presented) The protein of claim 28, wherein the Pyrococcus 
polymerase domain is a Pyrococcus furiosus domain. 



Page 28 of 114 



Appl. No. 09/870,353 

Appeal Brief, dated September 23, 2008 



PATENT 



30. (previously presented) A protein comprising two joined heterologous 

domains: 

a sequence non-specific double-stranded nucleic acid binding domain that 
comprises an amino acid sequence that has at least 75% sequence identity to the Sac7d sequence 
set forth in amino acids 7-71 of SEQ ID NO: 10; and 

a DNA polymerase domain, 

wherein the presence of the sequence non-specific double-stranded nucleic acid 
binding domain enhances the processivity of the polymerase domain compared to an identical 
protein that does not have the sequence non-specific double-stranded nucleic acid binding 
domain joined thereto. 

31. (cancelled) 

32. (previously presented) The protein of claim 30, wherein the sequence non- 
specific double-stranded nucleic acid binding domain and the DNA polymerase domain are 
covalently linked. 

33. (cancelled) 

34. (previously presented) The protein of claim 30, wherein the sequence non- 
specific double-stranded nucleic acid binding domain comprises an amino acid sequence that has 
at least 90% sequence identity to the Sac 7d sequence set forth in SEQ ID NO: 10. 
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35. (previously presented) The protein of claim 30, wherein the DNA 
polymerase domain has thermally stable polymerase activity. 



37. (previously presented) The protein of claim 35, wherein the DNA 
polymerase domain is a Thermus polymerase domain. 

38. (previously presented) The protein of claim 36, wherein the Thermus 
polymerase domain polymerase domain is a Taq polymerase domain. 

39. (previously presented) The protein of claim 35, wherein the DNA 
polymerase domain is a ATaq domain. 

40. (previously presented) The protein of claim 30, wherein the polymerase 
domain is a family B polymerase domain. 

41. (previously presented) The protein of claim 40, wherein the family B 
polymerase domain is a Pyrococcus DNA polymerase I domain. 

42. (previously presented) The protein of claim 41 , wherein the Pyrococcus 
polymerase domain is a Pyrococcus furiosus domain. 



36. 



(previously presented) The protein of claim 30, wherein the DNA 



polymerase domain 



comprises a family A polymerase domain. 
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43. (previously presented) The protein of claim 15, wherein the sequence non- 
specific double-stranded nucleic acid binding domain comprises an amino acid sequence that has 
at least 85% sequence identity to SEQ ID NO:2. 

44. (previously presented) The protein of claim 30, wherein the sequence non- 
specific double-stranded nucleic acid binding domain comprises an amino acid sequence that has 
at least 85% sequence identity to the Sac 7d sequence set forth in SEQ ID NO: 10. 
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IX. EVIDENCE APPENDIX 

A. Declaration under 37 C.F.R. § 1 . 1 32 by Dr. Peter Vander Horn 

a) filed with Appellants' response filed March 2, 2004 to a non-final Office 

Action. 

b) The Office Action mailed May 26, 2004 acknowledged the response from 
March 2, 2004. The Office Action dated February 4, 2005 additionally acknowledged that the 
declaration is of record. 

B. Declaration under 37 C.F.R. § 1 .132 by Dr. Yan Wang 

a) filed with Appellants' response filed July 5, 2007 

b) The Office Action mailed November 2, 2007 acknowledged the response of 

July 5, 2007. 
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IN THE UNITED STATES PATENT AMD TRADEMARK OFFICE 



In re application of: Examiner: Richard Hutson 

WANG Technology Center/Art Unit: 1652 

Application No.: 09/870,353 RULE 132 DECLARATION 

Filed: May 30, 2001 

For: IMPROVED NUCLEIC ACID 
MODIFYING ENZYMES 

Commissioner for Patents 

P.O.Box 1450 

Alexandria, VA 22313-1450 

Sir: 

I, Dr. Peter Vander Horn, being duly warned that willful false statements and the like are 
punishable by fine or imprisonment or both, under 18 U.S.C § 1001, and may jeopardize the 
validity of the patent application or any patent issuing thereon, state and declare as follows: 

1 . All statements herein made of my own knowledge are true and statements made on 
information or belief are believed to be true. The Exhibits (1-10) attached hereto are 
incorporated herein by reference. 

2. 1 received a Ph.D. in microbiology from Cornell University in 1991 . A copy of my 
curriculum vitae is attached as Exhibit 1. 

3. 1 am presently employed by MJ Bioworks, Inc. as Vice President of Research, 
Development, and Engineering. I am primarily responsible for supervising research teams 
working to improve our scientific instrumentation products. MJ Bioworks is the assignee of the 
subject patent application. 

4. I have read and am familiar with the contents of the application. As I understand the 
bases for the outstanding rejections, the Examiner believes that the pending claims are overly 
broad and that it would take undue experimentation to identify members of the genus of non- 
specific double-stranded nucleic acid binding domains that are either recognized by polyclonal 
antibodies generated against Sso7d or have at least 50% identity to a 50 amino acid subsequence 
of Seq. ID No: 2 or a 75% identity to Sac7d. 
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5. The criteria set forth in the claims was intended to provide us with claim scope that 
embraced both naturally occurring proteins in the family of non-specific DNA binding Archaeal 
proteins as well as "Archaeal 7 kDa muteins". By Archaeal 7 kDa muteins, I am referring to 
man-made recombinantly produced proteins that are derived from naturally occurring proteins. 
In this context, muteins differ from their parent proteins by the introduction of amino acid 
changes where those changes do not markedly alter its DNA binding properties compared to the 
parent protein. 

6. It is the intent of this declaration to explain in objective scientific reasons, why one of 
skill can identify working embodiments that fall within the scope of these claims with routine 
experimentation. In summary, there are three objective reasons and one subjective reason. The 
three objective reasons are: (i) that genetic variation or drift within the naturally occurring 
species of Archaeal 7 kDa proteins provides an initial road map for point mutations; (ii) that 
conventional knowledge of protein chemistry allows for us to predict that biological properties 
can be preserved so long as amino acid substitutions are conservative in their nature; and (iii) that 
knowledge of the three dimensional structure of these proteins when bound to DNA permits us to 
predict areas of non-criticality where substitutions may be freely introduced beyond mere 
conservative substitutions. As a subjective rationale, we must consider that the family of 
Archaeal 7 kDa proteins come from extremophilic bacteria that live in acidic environments above 
the melting temperature of DNA. This group of extremophiles includes many unexplored species 
that by virtue of their habitats are expected to have Archaeal 7 kDa-like DNA binding proteins. 
With so many species to be studied and so few cultured it is highly probable that additional 
members of the family will be discovered with even greater variation than those that are 
presently known and sequenced. 

7. NATURAL VARIATION . 

With regard to naturally occurring 7 kDa proteins in the family of Archaeal DNA-binding 
proteins, there are many family members reported in the literature. It is an accepted convention 
that proteins with E scores below 0.01 are unlikely to occur by chance and are therefore 
statistically related. Using Sso7d as a prototype, we studied the family of Archaeal DNA binding 
proteins reported in GenBank. We noted that there are at least 17 related members of the 7 kDa 
class of Archaeal proteins. The least related of which has an E value of 9x1 0" 6 . 

The evolutionary relationship between the members of this family is made quite clear 
when you conduct a BlastP search comparing Sso7d to its family members. Using the default 
parameters provided by the specification on page 16, lines 7-1 1 with the "Low Complexity" 
filter set to off to permit us to align the entire 63 amino acids, we get the following results: 
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SEQ ID: 2 

1) RNaseP3 of S 

2) Sso7d 


ATVXFKYKGEEKEVDISKIKXVWRVGKMISFTYDEGGGKTGRGAVSEKDAPKELLQMLEKQKK 

: at vkfkykgeekqvdi ski kkvwrvgkmisf tydegggktgrgavsekdapkel Iqmmpetgkyf rhklpddypi 
meismatvkf kykgeekevdiskikkvwrvgkmisf tydegggktgrgavsekdapkel lqrnlekqkk 
ma tvk £ kykgeekqvdi sk i kkvwrvgkmisf tydegggktgrgavsekdapkel lqml akqkk 


Idonti ty 

90% 

100% 

98% 


Similarity 

95% 
100% 
100% 


4) Sso7d 


ma tvkf kykge eke vdisk ikkvwrvgkmi sf tydegggktgrgavsekdapkel lqrnlekqkk 


100% 


100% 
100% 




a tvk f kykgeekqvdi sk ikkvwrvgkmi s f tydegggktgrgavsekdapke 1 Iqrnlekqk 


98% 


O ) ObU (U 

7) Sso7d 

8) Sso7d 

9) Ssli7B 


ma t vk f kykgeekqvd i sk i kkvwrvgkmi s f t ydegggkt g rgavs ekdapke 1 1 qmlekqkk 
atvkf kykgeekevdiskikkvwrvgkmisf tydegggktgrgavsekdapkel lqrnlekqkk 
atvkf kykgeekqvdi ski kkvwrvgkmisf tydegggktgrgavsekdapkel lqrnlekqkk 

mvtvkfkykgeekevdt ski kkvwrvgkmisf tydegggktgrgavsekdapkel lqrnlekqkk 


98% 
100% 
98% 
98% 


100% 
100% 
100% 
98% 


1 0) Sso7d mutant 


atvkf kykgeekqvdi ski kkvwrvgkmi sa tydegggktgrgavsekdapkel Iqrnlekqk 


96% 


98% 


H)Sso7e/Sto7e 


mvtvkfkykgeekevdiskikkwrvgkmisf tydd-ngktgrgavsekdapkel lqrnleksgkk 


91% 


93% 


12) Sac7a 


vkvkf kykgeekevdtskikkvwrvgkmvsf tydd-ngktgrgavsekdapkelldmlarae 


86% 


91% 


13)Sac7a/b/d 


mvkvkf kykgeekevdtskikkvwrvgkmvsf tydd-ngktgrgavsekdapkel ldmlaraerekk 


81% 


90% 


14) Sac7e 


makvrf kykgeekevdtskikkvwrvgkmvsf tydd-ngktgrgavsekdapkelmdmlaraekkk 


79% 


88% 


15) 1SAP/Sac7 


kvkf kykgeekevdtskikkvwrvgkmvsf tydd-ngktgrgavsekdapkel ldmlaraerekk 


86% 


91% 


16) Sac7e 


akvi f kykgeekevdtski kkvwrvgkmvsf tydd-ngktg rgavs ekdapkelmdmlaraekkk 


79% 


88% 


17) Sso Dna 


tvkf kykgeekqvdi ski kkvxrvgkmisf tydegxgk 


92% 


94% 



binding protein 



From the above BLASTP data, we can see that the natural variation within the family 
extends to below 80% identity. At a minimum, it was the applicants' intent to encompass in a 
single claim all naturally occurring known variants of the DNA binding Archaeal protein family. 
But our knowledge of variants can be extended to include muteins by applying our knowledge of 
protein chemistry - knowledge that is both routine and predictable in its application. 

8. MUTEINS CREATED BY COMBINING NATURALLY OCCURRING 
VARIATION . 

Muteins of Archaeal 7 kDa proteins can be readily created by those of skill exploiting 
variation within the natural members of the family to create novel combinations of variations. In 
essence, the naturally occurring members are a road map to defining the critical amino acids 
from the non-critical amino acids. 
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A cursory review of the family reveals that the amino and carboxyl termini are not critical 
to the functionality of these proteins. The amino and carboxyl ends are very tolerant of 
substitutions and additions. They are sites of divergence between the homologues and the 
invention. As evidence of the robust nature of these proteins, we placed entire polymerase 
domains on both the carboxyl and the amino ends without interfering with binding. This was Dr. 
Wang's rationale for claiming sequence similarity to a 50-amino acid subsequence, rather than to 
the entire protein. Biological functionality appears to be determined by the conserved amino 
acids that form the internal core of these proteins (see Choli et al. (1988) Biochimica et 
Biophysica Acta, 950: 193-203 at 202) (Exhibit 2). But even there the identity is not 100%. 

9. MUTEINS CREATED BY INTRODUCTION OF CONSERVED 
SUBSTITUTIONS . 

In addition to the introducing combinations of naturally occurring variations into a 
prototype 7 kDa binding protein, those of skill can also substitute conserved amino acids for 
naturally occurring ones that have not been found to vary in nature. Classic examples of such . 
pairings are lysine and arginine, alanine and glycine, glutamine and asparagine, and aspartic acid 
and glutamic acid. All of which appear in this family of proteins. For example, there are 12 
residues of Sso7d 63 residues in which natural variations are known. By substituting conserved 
amino acids for another 20 residues, we can easily produce a non-specific 7 kDa Archaeal 
mutein that would almost certainly work to improve processivity of a polymerase. 

10. MUTEINS DERIVED FROM STUDIES OF THREE DIMENSIONAL 
ANALYSES . 

We need not limit our muteins to combinations of naturally occurring amino acid 
variations nor to those that are unnatural but between amino acids of similar chemical properties. 
This is because the three dimensional structure of these proteins when interacting with DNA is 
known. See Exhibit 3 Gao et al. 

Knowledge of three dimensional features provides yet another strategy permitting protein 
chemists to engineer away from the native sequences because it provides structural activity 
relationships between the protein domains and DNA. Knowing which domains play a role in 
DNA binding and which are non-critical for binding permits us to think beyond mere 
conservative amino acid substitution and to allow for Archaeal 7 kDa muteins with lower percent 
identities than if we confined our mutein development strategy to the first two objective 
approaches. 

Attached to this declaration as Exhibits 4-8 are enlargements of figures derived from the 
data of Gao, et ah with an accession number of 1BNZ. 1 Exhibit 4 is a ribbon diagram of the 



These figures are derived from the protein crystal coordinates that Gao 
submitted to the protein structure database. Submission is a requirement 
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crystal structure of Sso7d bound to DNA. The beta sheets of the protein are in yellow, the alpha 
helix is in green. Unstructured regions are in blue. 

As predicted, the unstructured regions are sites where divergences from Sso7d among the 
group of related proteins cluster. One skilled in the art could place additional insertions into these 
sites that will decrease sequence identity in blast analyses. For example, a thermostable loop can 
be placed in the G37, G38, G39 turn. 

In addition, the entire alpha helix (green) is highly mutable. This is evidenced by the fact 
that a great deal of natural variation of the homologs is observed in this domain. It should be 
noted that the naturally occurring mutations in this domain do appear to preserve the presence of 
an alpha helix and this region does not interact with the DNA substrate. Therefore, additional 
mutations could be introduced into the alpha helix (as long as they preserve the secondary 
structure) and serve to further lower the amino sequence identity compared to SEQ ID 2. 

Using the three dimensional figures, those of skill could also take note that the differences 
in composition and length between Sso7 and Sac7 proteins cluster in the turns between beta 
sheets and in amino acids facing away from the DNA binding domain in the crystal structure. 
So these domains are also areas of plasticity. 

The papers cited in the patent application describe several exposed lysine residues that are 
methylated in vivo. These sites are not involved in DNA binding but appear to be regulatory. As 
our work is independent of bacterial gene regulation, these lysines could be mutated so long as 
they do not .interact with the DNA substrate. As can be seen in Exhibits 5 though 8, many of 
these lysine residues project away from the domain and do not interact with DNA. These 
residues are excellent candidates for mutagenesis. One skilled in the art would recognize that 
these could be changed to arginine residues without affecting DNA binding. 

I was able to find 10 such sites by examining the crystal structure. Exhibit 5 shows lysines 
19, 40, 49, and 53 projecting away from the DNA binding surface of the protein. Exhibit 6 also 
shows lysines 49, 61, and 64. Exhibit 7 shows lysine 63 and Exhibit 8 shows lysines 5 and 13. K 
to R derivatives already exist for positions 5 and 61, validating this approach. No divergence 
from the Sso7d sequence has been observed for the remaining 8 lysines, probably because of the 
regulatory role alluded to earlier. Mutating these lysines can yield an additional 8 differences 
from SEQ ID No. 2, or 13%. 



similar to the requirement that sequences be deposited into Genbank with an 
accession number. The accession code for Sso7d protein bound to DNA is 1BNZ . 
The coordinates are viewed and turned into these figures using the program 
Cn3d, which is freely available at 

http : //www. ncbi . nlm . nih . gov/entrez/query . f cgi?db=Structure . 
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For these varied but objective reasons, one skilled in the art could with a combination of 
conserved substitutions, insertions, deletions, and exchanges of mutable sites construct DNA 
binding proteins that are very divergent from SEQ ID: 2 and Sac7d. I will discuss specific 
percentages later in this Declaration. 

11. OTHER EXTREMOPHILES WILL HAVE ARCHAEAL 7 kDa LIKE PROTEINS . 

Beyond the objective reasons presented above, there is a subjective reason why a 
percentage below 90% is needed to avoid routine engineering around the presently issued claims. 
As of today there have been many Archaeal 7 kDa proteins that have already been reported, it 
should be noted that these proteins are very abundant in Sulfolobus species. In fact, they are 
probably abundant in any organism that has to live in acid at >70°C chemolithotrophically. Here 
are S. Solfataricus s_ relatives many of which are expected to contain Sso7d-related proteins. 

Archaea ; Crenarchaeota ; Thermoprotei ; Sulf olobales 
Sulf olobaceae 
Acidianus 

Acidianus ambivalens 

Acidianus brierleyi 

Acidianus inf ernus 

Acidianus . tengchongenses 

Metal losphaera 

Metallosphaera prunae 

Metallosphaera sedula 

Metallosphaera sp. GIBll/00 

Metallosphaera sp. Jl 

Metallosphaera sp. TA-2 

environmental samples 

uncultured Metallosphaera sp . 

Stygiolobus 

Stygiolobus azoricus 
environmental samples 

uncultured Stygiolobus sp . 

Sulfolobus 

Sulfolobus acidocaldarius 
Sulfolobus islandicus 
Sulfolobus metallicus 
Sulfolobus shibatae 
Sulfolobus solfataricus 
Sulfolobus thuringiensis 
Sulfolobus tokodaii 
Sulfolobus yangmingensis 
Sulfolobus sp. 
Sulfolobus sp. AMP12/99 
Sulfolobus sp. CH7/99 
Sulfolobus sp. FF5/00 
Sulfolobus sp. MV2/99 
Sulfolobus sp. MVSoil3/SC2 
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Sulf olobus 


sp. 


MVSoil6/SCl 


Sulf olobus 


sp. 


NGB23/00 


Sulf olobus 


sp. 


NGB6/00 


Sulf olobus 


sp. 


NL8/00 


Sulf olobus 


sp. 


NOB8H2 


Sulf olobus 


sp. 


RC3 


Sulf olobus 


sp. 


RC6/00 


Sulf olobus 


sp. 


' RCSC1/01 


Sulf olobus 


sp. 


RT8-4 



environmental samples 

uncultured Sulf olobus sp . 

Sulf urisphaera 

Sul f urisphaera ohwakuensis 

So far only Sulfolobus solfataricus and Sulfolobus tokodaii genomes have been sequenced. 

Given the range of divergence in Archaeal 7 kDa DNA binding proteins set forth above from a 
tiny portion of species sequenced, it will be trivial to find additional species of these DNA 
binding proteins that will have 70% or less homology to the presently known prototypes. 

12. THE 90% LIMITATION OF THE '424 PATENT INVITES THOSE OF SKILL TO 
ENGINEER AROUND THE CLAIMS WITH EASE . 

Let's look more specifically at the information that was available prior to filing the 
subject application. Dr. Wang's earlier patent US Pat. No. 6,627,424 ['424] issued with claims 
covering 90% identity to Sso7d and identity to Sac7d. Below I have created a paired table 
comparing the relative homology between Sso7d and Sac7d and Sac7d and Sac7e. 

As you can see, close relatives of Sso7d, (i.e., Sac7a,b,d and e) are not covered by the 
recited percentage in our '424 patent claims. But a pair-wise alignment of these sequences to the 
two specific examples gives one a clear road map to implementing the invention with any of the 
naturally occurring homologues. 

Sso7d alignment to Sac7d. 

Sso7d: 1 MATVKFKYKGEEKEVDISKIKKVWRVGKMISFTYDEGGGKTGRGAVSEKDAPKELLQML EKQKK 64 Identity Similarity 

M- -VKFKYKGEEKEVD-SKIKKVWRVGKM+SFTYD+--GKTGRGAVSEKDAPKELL-ML E++KK • 80% 85% 

Sac7d: 1 MVKVKFKYKGEEKEVDTSKIKKVWRVGKJWSFTYDD-NGKTGRGAVSEKDAPKELLDMLARAEREKK 66 

Note: the percent identity changes to 82% and the similarity changes to 88% if Seq ID 2 is used. This is because Seq 
ID 2 is Sso7d without the MET. One skilled in the art would study the entire sequence. 

Sac7d aligned to Sac7e (not covered in the M24 patent because it is 79% identical to Seq ID 2) . 
Sac7d : 1 IWKVKFKYKGEEKEVT3TSKIKKWRVGK^SFTYDDNGKTGRGAVSEKDAPKELLDMLARAEREK 65 Identity Similarity 
M KV+FKYKGEEKEVDTSKIKKVWRVGKMVSFTYDDNGKTGRGAVSEKDAPKEL+DMLARAE++K 92% 98% 

Sac7e: 1 MAK V R F K YKG E E KE VDTS K I K KVW R VG KMVS FT YDDNG KTG RG A VS E KD A P K E LMDMLARAE K K K 65 
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Note: A 49 amino acid core sequence is completely identical. 
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Finding alternative species not covered by the allowed claims of the '424 patent, whether 
the above recited naturally occurring species or man-made muteins are trivial exercises for one 
skilled in the art. No reasonable protein chemist looking at this data would doubt that Sac7e 
could increase the processivity of polymerases if traded out for Sac7d in the constructs in Seq ID 
No. 9 and SEQ No. ID 10 of the '424 patent. 

It is also helpful to take note that three of the references Dr. Wang cited in the patent 
(Choli et. al. Exhibit 2 3 Baumann et. ah Exhibit 9, and McAfee et. al. Exhibit 10) contain 
figures with sequence alignments of Sso7d homologues including Sac7d, Sac7a, and Sac7e. 
They are repeatedly described as structurally and functionally closely related proteins. The Sac7d 
construct (figure 2 of the application) was made to support that contention that these homologues 
would work. Dr. Wang clearly knew about and taught these proteins would work in the 
invention. No one skilled in the art that reads the patent specification and the referenced papers 
would have objective reasons to think it wouldn't work. 

For these reasons, I submit that a 79% identity to Sso7d using naturally occurring 
variants is clearly enabled by the specification. 

13. ROUTINELY INTRODUCING NON-NATURAL VARIATIONS LOWERS THE 
PERCENTA GE BELOW 79%. 

Using natural variants as a road map a 79% identity is readily available. But man-made 
modifications can take this 79% identity lower. One can go lower in percent identity by merely 
combining known deviations from Sso7d. Using the family of Sac7 proteins as a road map one 
obtains the following hybrid sequence: 

Hypothetical 7d: mvkvkvrfkykgeekqvdtskiickvgrvgkmvsat^ 

The hypothetical protein 7d is 76% identical to Sso7d as shown in the alignment below. 

Sso7d : ATV K F K Y KG E E KE VD I S K I K KVW R VG KM I S FT YD E GGG KTGRG AVS E KD A P KE LLQM L EKQKK 64 

- - V+ FKYKGEEK+VD-SKI KKV-RVGKM+SFTYD+- -GKTGRGAVSEKDAPKELL-ML E+ + KK 

Hypothet : VKVRFKYKGEEKQVDTSKIKKVGRVGKMVSFTYDD-NGKTGRGAVSEKDAPKELLDMLARAEREKK 6 5 



14. COMBINING ALL THE INFORMATION WILL LEAD ONE OF SKILL TO 
MUTEINS HAVING LESS THAN 60% SEQUENCE IDENTITY TO SAC7d . 

Combining all of these changes together one can get a functional derivative of SEQ ID No. 
2 with less than 60% amino acid identity in a blast search. One example of such a protein 
sequence is below. 



2 

One known Sso7d divergence was not included in this alignment. The F34A 
mutation was not included because it is known to destabilize the protein. All 
other divergences are from functional proteins. 
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VKVRVRFKYK GEERQVDTSR IRKVGRVGKM VSATYDDACA AACNGRTGRG AVSERDAPRE LLDMLARAER 
ERR 

We have identified other muteins of Sso7d that enhance polymerase performance. For 
instance: I 21 to V, T 47 to N, D 56 to Y, and M 64 to K in the above sequence. With one exception 
(I 21 to V), these are not conserved changes; but, they are changes that do not affect core 
structures. 

When all this information is combined, it would be straightforward to identify muteins with 
less than 60% identity to Sso7d that would still enhance polymerase performance. 

15. THE ARCHAEAL 7 kDa PROTEINS ARE AN ANCIENT PROTEIN AND 
EXISTING EVOLUTIONARY DRIFT ESTABLISHES THE HIGH PROBABILITY THAT 
MUTEINS WITH 50% IDENTITY TO ANY KNOWN SPECIES CAN BE CREATED . 

From an evolutionary perspective, this family of thermal stable DNA binding proteins is 
apparently quite ancient. There is a restriction endonuclease from Methanococcus jannashii 
(Results below)-another archaeon- that a blast search of the Swissprot Database with Seq ID 
No. 2 will identify. The 47% identity of this DNA binding protein to Sso7d indicates that the 
DNA binding domain has been around for a long time and that with routine sequencing of 
genomes from the Archaeal family there will be many easily obtainable proteins with even less 
than 50% identity to Seq ID No. 2 that will work in the invention. 

> gill0954S28|ref1NP 044167.11 M. jannaschii predicted coding region MJECL41 [Methanococcus jannaschii] 

gi|l 22 2998tt lspl O602 %lTl SH M 1:TJ A P utat i ve type i restric ti on en zyme Mj a X P speci fi ci ty prote i n (S 
protein) (S.MjaXP) 

tt il2l29054jpirHH64514 hypothetical protein MJECW1 - Methanococcus jannaschii plasmid 
pURBSOO 

gil1522674jtzbiAAC371 10.1 1 M. jannaschii predicted coding region MJECL4 1 [Methanococcus 
jannaschii] 
Length = 432 

Score = 30.0 bits (66), Expect = 8.5 

Identities = 19/45 (42%), Positives = 24/45 (53%), Gaps = 1/45 (2%) 

Query: 3 VKFKY KGEEKE VD I S KI KKVWRVGKM I S FTYDEGGGKTGRG AVS E 4 7 

VKF+++.E.KE.DI .KI.K.W.V.K. I + GG.T + .E 

Sbjct: 5 VKFRWETEFKETDIGKIPKDWDV-KKIKDIGEVAGGSTPSTKIKE 48 

Having provided multiple objective roadmaps to the creation of muteins, it needs to be 
said that actual function is always subject to empirical determination. To determine if the 7 kDa 
Archaeal muteins function as desired, the Examiner is asked to take note of the generic assay for, 
DNA binding described on pages 18-19 of the specification. Here, the inventors present a 
generic method for readily and conveniently testing for operable species. 
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Based on the objective reasons set forth above, I submit that the creation of Archaeal 7 
kDa muteins having 60 to 50% identity to native Archaeal 7 kDa is a matter of routine 
experimentation. 

1 6. DEFINING THE PROTEINS BY THEIR ABILITY TO BIND TO ANTIBODIES 
GENERATED AGAINST A PROTOTYPE LIMITS THE PRIMARY AMINO ACID TO 
DEFINED STRUCTURE. 

In addition to defining the invention by a percent identity, an alternative scope of claim 
protection was presented where the DNA binding proteins were defined as those recognized by 
polyclonal antibodies generated against specific Archaeal 7 kDa DNA binding proteins. The 
Examiner has rejected claims directed to non-specific double-stranded nucleic acid binding 
domains that are recognized by polyclonal antibodies generated against Sso7d. As I understand 
the rejection, the Examiner believes that the scope of this claim encompasses too many non- 
operable species to be considered allowable. 

In the first instance, I would like to point out that the scope of proteins encompassed by 
the language is more limited than the claims where the proteins have 50% identity. 

The use of immuno-crossreactivity to define proteins as related or unrelated is an old and 
well-recognized art. The specification, at pages 16-18 provides a routine and conventional 
means to compare unknown proteins with known proteins. 

In addition, it is well-known in the art to use antisera as identification reagents to clone 
genes, based on the expression of a protein mediated by an expression vector. If the library 
source is one of the naturally-occurring relatives ofSulfolobus sulfataricus listed above, the 
probability that any cross-reacting gene obtained from the library would function to increase the 
processivity of polymerases is very high. 

But naturally occurring proteins are not the only proteins that would be expected to cross 
react with polyclonal antisera generated against the prototype Archaeal 7 kDa proteins. One 
could easily envision muteins that would retain immuno-crossreactivity. To the extent that some 
may lack function; those inoperable embodiments could be rapidly distinguished from operable 
species using the prescribed assay set forth in the specification. 

When these teachings are coupled with the generic assay for testing functionality of the 
proteins to non-specifically bind to DNA (see the specification at pages 18 and 19), f submit that 
there is no objective reason to doubt that the identification of many operable species with 50% or 
greater sequence identity with SSo7d or Sac7d with polyclonal antibodies specific to the two 
prototypes would be anything other than routine and expected. 
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This Declarant has nothing further to say. 
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(S. solfataricus) 

DNA-binding proteins have been extracted from the thermoacidophilic archaebacterium Sulfolobus solfa- 
taricus strain PI, grown at 86 °C and pH 4.5. These proteins, which may have a histone-like function, were 
isolated and purified under standard, non-denaturing conditions, and can be grouped into three molecular 
mass classes of 7, 8 and 10 kDa. We have purified to homogenity the main 7 kDa protein and determined its 
DNA-binding affinity by filter binding assays and electron microscopy. The Stokes radius of gyration 
indicates that the protein occurs as a monomer. The complete amino-acid sequence of this protein contains 
14 lysine residues out of 63 amino acids and the calculated M T is 7149. Five of the lysine residues are 
partially monomethylated to varying extents and the methylated residues are located exclusively in the 
N-terminal (positions 4 and 6) and the C-terminal (positions 60, 62 and 63) regions only. The protein is 
strongly homologous to the 7 kDa proteins of Sulfolobus acidocaldarius with the highest homology to protein 
7d. Accordingly, the name of this protein from 5. solfataricus was assigned as DNA-binding protein Sso7d. 



Introduction 

The mode of packing for eukaryotic DNA is 
well established. A set of small basic proteins, the 
histones, are involved in the formation of compact 
DNA-protein particles which contain the double- 
helical DNA coiled around an octameric histone 
complex [1]. In bacteria, the mechanism for fold- 



Abbreviations: TPCK, N-tosylamido-2-phenyIetbylchloro- 
methyl ketone; DABITC. 4-N f N '-dimethylaminoazobenzene- 
4'-isothiocyanaie; SSC. 0.15 M irisodium citrate/0.015 M 
NaCl (pH 7.0); PMSF, phenylmethylsulphonyl fluoride; BSA, 
bovine senim albumin; PTH, phenyUhiohydantoin. 

Correspondence: T. Choli, Max-Planck-Institut Tur Molekulaie 
Ceaetik, Abteilung Wittroann, Ihnestr. 73, D-1000 Berlin 33 
(DahJem), Germany. 



ing the long circular DNA molecule into a com- 
pact form is much less clear. Although a number 
of proteins have been implicated for this function 
[2], a precise description of the composition of 
4 bacterial chromatin' is not yet available. 

Although the structure and composition of the 
bacterial nucleoids are not very well defined, there 
is compelling evidence that bacterial DNA is 
folded into a compact complex [3,4] through the 
participation of at least three proteins [5). In re- 
cent years, several histone-like DNA-binding pro- 
teins have been isolated from eubacteria, called 
NS1 and NS2, HU, HD or DNA-binding protein 
II. Their amino-acid sequences have been de- 
termined and are currently under further investi- 
gation [6-10]. Significant homologies have been 
found between the eubacteria! proteins and the 
first protein isolated from the archaebacterium 
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Thermoplasma acidophilum (for reference see Ref. 
8). Previously, at least two groups of DNA-bind- 
ing proteins with estimated molecular masses of 9 
kDa and 6 kDa were found in several Sulfoldbus 
species [11]. From our results it has become clear 
that Sulfolobus acidocaldarius contains several 
DNA-binding proteins of similar sizes with M T 
values of 7000, 8000 and 10000 [12,13], of which 
the predominant protein, 7d [14], and three of the 
minor components (proteins 7a, 7b and 7e) have 
been sequenced recently [15]. 

In this paper we present the isolation, char- 
acterization and primary structure determination 
of the predominant 7 kDa protein from Sulfolobus 
solfataricus strain PI and compare its sequence 
with that of the other known bacterial DNA-bind- 
ing proteins. Our nomenclature for these proteins 
in the 7 kDa class is based on the increased 
basicity of the proteins in the order 7a to 7e due 
to their charge differences [12]. To avoid confu- 
sion, it should be pointed out that the primary 
structure of the dominant 7 kDa protein from S. 
acidocaldarius DSM 1616 has been determined 
[14], but at those times the organism was named 
Sulfolobus solfataricus DSM 1616. Comparison of 
DNA-binding proteins, characterization of ribo- 
somal proteins by two-dimensional gel elec- 
trophoresis and the immunological characteriza- 
tion of RNA-polymerase subunits had demon- 
strated clearly that the strain DSM 1616 is similar 
although not identical to S. acidocaldarius DSM 
639 and different from other & solfataricus strains 
[13]. Therefore, this strain was renamed 5. 
acidocaldarius DSM 1616. 

Experimental procedures 

Materials 

Sodium dodecylsulfate (SDS) was obtained 
from Serva (Heidelberg, F.R.G.). TPCK trypsin 
was obtained from Worthington (Freehold, NJ, 
U.S.A.).. DABITC was from Fluka (Buchs, 
Switzerland), and recrystallized from boiling 
acetone. Ovalbumin, chymotrypsinogen A, 
myoglobin, cytochrome c and bovine trypsin 
inhibitor were from Serva (Heidelberg, F.R.G.). 
The scintillation cocktail was Beckman Ready-Solv 
TM EP , Beckman (Berkeley, CA, U.S.A.). All solu- 



tions used for protein purification contained 0.1 
rnM PMSF, 0.1 mM benzamidine hydrochloride 
and 6 mM 2-mercaptoethanol, //'-monomethyl- 
lysine and the other methylated lysine derivatives 
were purchased from Serva and CalBiochem 
(Frankfurt, F.R.G.). Acetonitrile and 2-propanol 
for HPLC solutions were of LiChrosolv grade and 
all other chemicals were of pro analysis grade 
purchased from Merck (Darmstadt, F.R.G.). 

Methods 

S. solfataricus strain PI was obtained from W. 
Zillig (Munich), and cells were grown at 86 °C 
under conditions described in Ref. 12, with the 
addition of 1 g per liter casamino acids (Difco, 
Detroit, MI, U.S.A.) to the medium. 

Purification of the DNA-binding protein. S. 
solfataricus cells were suspended in Polymix-Hepes 
buffer [16]. After addition of DNAase I (RNAase 
free), the cells were broken twice in a Gaulin- 
Manton press (General Electric, Fort Wayne, IN, 
U.S.A.) at 72 MPa (9000 lb/inch 2 ). Cellular de- 
bris was removed by centrifugation (1.5 h at 
10000 X g) and the salt concentration of the su- 
pernatant was raised to 1 M NH 4 C1. Ribosomes 
were separated from smaller proteins by centrifu- 
gation overnight at 160 000 Xg. The supernatant 
was dialysed against 10 mM phosphate buffer at 
pH 6.0 and applied onto a CM-Sepharose CL-6B 
column (5 x 40 cm). Proteins were eluted with a 
linear NaCl gradient from 0.05 to 0.8 M in 10 mM 
phosphate buffer at pH 6.0 (20 1, flow rate 100 
ml/h), 30 ml fractions were collected and assayed 
for protein content by SDS-polyacrylamide gel 
electrophoresis (SDS-PAGE). Further purification 
was obtained by gel filtration on Sephadex G-50 
superfine in 0.35 M NaCI and additionally by 
ion-exchange chromatography on Fractogel TSK 
CM-650 (S) with a linear NaCl gradient from 0.1 
to 0.5 M. 

Proteins were checked for purity and identified 
by slab gel electrophoresis in the presence of SDS. 

Determination of Stokes radii Stokes radii of 
gyration, R s , were determined by analytical gd 
filtration on a Sephadex G-50 superfine column 
(1.7 X 190 cm) in 0.35 M NaCl/20 mM phosphate 
buffer (pH 7.0). The flow rate was 12 ml/h and 
the absorption at 230 nm was recorded continu: 
ously. The distribution coefficient, k D , was calcu- 
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lated from the void volume ((K 0 ) determined with 
Dextran blue (2000)), the total available volume 
((V x ) determined with benzamidine hydrochloride), 
and the elution volume (V c ). The calibration line 
for Stokes radii was obtained by plotting the 
inverse error function of (1 - k D ) against R s as 
described by Ackers [17]. The column was 
calibrated using the following proteins as markers: 
ovalbumin (3.0 nm), chymotrypsinogen A (2.2 nm), 
myoglobin (1.9 nm), cytochrome c (1.61 nm) and 
bovine trypsin inhibitor (1.45 nm). 

Filter binding assays. The filter binding assay 
described in Ref. 18 was modified according to 
Ref. 13. A fixed amount of 3 H-labeled DNA and 
increasing amounts of protein were incubated in 
0.1 X SSC buffer, but containing 0.25 M NaCl, for 
15 min at 37 °C. DNA-protein complexes were 
collected onto Millipore filters (0.45 jtim, Milford, 
MA, U.S.A.) which were presoaked for 1 h at 
22° C in 10 mM KC1/1 mM EDTA/5 mM 2- 
tnercaptoethanol/50 /ig/ml BSA. The complexes 
were washed three times with 3 ml portions of 
0.1 X SSC buffer containing 0.25 M NaCl and 
quantified by liquid scintillation counting (Beck- 
man LS 7000). The DNA-binding affinity of the 
examined proteins was expressed in percent refer- 
ring to the 100% sample of [ 3 H]DNA without 
protein content. 

Gel-filtration binding experiments. DNA binding 
experiments using size exclusion chromatography 
on a Sephadex G-50 superfine column (2 x 50 cm) 
were carried out as, described in Ref. 14. A fixed 
amount of Sulfolobus DNA and protein 7d was 
incubated for 15 min at 67 °C in 'polymix' buffer 
[16]. 1 ml of the sample was injected into the 
column and comigration of the protein with DNA 
was established by analysis of the void volume 
peak by SDS gels. 

Electron microscopy studies. The formation of 
DNA-protein complexes and the preparation of 
samples for electron microscopy by adsorption to 
mica was performed as described in Ref. 19. Vari- 
able amounts of protein were incubated with dou- 
ble-stranded plasmid RSF 1010 and single- 
stranded $X 174 DNA in a buffer comprising 10 
mM triethanolamine-HCl/50 mM KC1/2.5 mM 
MgCVIS mM 1,4-dithiothreitol (pH 7.5). Com- 
plexes were fixed with 0.2% (v/v) glutaraldehyde, 
adsorbed to mica and stained with 2% (w/v) 
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aqueous uranyl acetate. Rotary shadowing was 
done with platinum-iridium (80 : 20) at an angle of 
about 8°. Electron micrographs were made with a 
Philips electron microscope, model EM 480. 

Enzymatic digestion with trypsin. The protein 
was digested with TPCK-trypsin (enzyme-to-sub- 
strate ratio, 1 : 50) in 100 mM A^methylmorpho- 
line acetate buffer at pH 8.1 for 2 h at 37 °C, with 
gentle surring. The peptides were separated by 
reversed-phase HPLC (RP-HPLC) on a Vydac C 18 
(201 TPB) column (250 X 4 mm) in dilute aqueous 
trifluoroacetic acid using an acetonitrile gradient. 

Cleavage with CNBr. Protein 7d (1 mg) was 
cleaved with 6 mg CNBr in 70% (v/v) formic acid 
for 48 h in the dark under nitrogen at ambient 
temperature The peptides obtained were sep- 
arated directly by RP-HPLC on a Vydac C 4 (214 
TP54) column (250 x 4 mm) with a gradient of 
2-propanol in aqueous 0.1% trifluoroacetic acid, 
or with a Vydac C 18 (201 TPB) column (250 X 4 
mm) with an acetonitrile gradient in aqueous tri- 
fluoroacetic acid. 

Sequence determination. Automatic sequencing 
of the intact protein was done in a liquid phase 
sequencer [20] with on-line 1 detection of the PTH- 
amino acids [21] by isocratic HPLC employing a 
2-propanol HPLC solvent system [22] or in a 
pulsed gas-liquid phase sequencer [23] (Applied 
Biosystems, model 477A) with on-line detection of 
the PTH-amino acids by HPLC using a gradient 
system (Applied Biosystems PTH-analyzer, model 
120A). Sequence analysis of tryptic peptides was 
performed by manual microsequencing employing 
the DABITC/PITC double coupling method, and 
the amino-acid derivatives were identified by 
two-dimensional thin-layer chromatography 
[24,25]. DABTH-Leu and DABTH-Ile, which 
comigrate on the micro-TLC plates were identified 
by isocratic HPLC [26]. The peptides obtained 
from cyanogen bromide cleavage which carried 
homoserine residues were sequenced in a solid 
phase sequencer employing the homoserine lac- 
tone attachment procedure [27,28]. 

Amino-acid analysis. Hydrolysis of the protein 
and peptides was performed in 100 ftl 5.7 M HQ 
for 24 h at 110° C. The amino acids were de- 
termined after precolumn derivatization with o- 
phthaldialdehyde by RP-HPLC separation as de- 
scribed in Ref. 29. 



196 



Results and Discussion 

The growth of S. solfataricus strain PI, brea- 
kage of cells and isolation of the DNA-binding 
proteins were performed as described in the Ex- 
perimental procedures. Similar to S. acidocaldarius 
cells [12], three molecular weight classes of DNA- 
binding proteins of 7, 8 and 10 kDa have been 
isolated from S. solfataricus strain PI. The major 
component of the 7 kDa class is the DNA-binding 
protein 7d, according to the nomenclature used 
for the DNA-binding proteins from S. acidocal- 
darius [13]. 

Fig. la shows the protein separation on CM- 
Sepharose CL-6B. The fractions containing pro- 
tein 7d and an 8 kDa protein are marked. Further 




purification of protein 7d was performed by gel- 
filtration on Sephadex G-50 and by ion-exchange 
chromatography on CM-Fractogel TSK as 
described in Experimental procedures (chromato 
grams not shown). Fig. lb shows the purified 
protein 7d from S. solfataricus PI on SDS-PAGE 
in comparison to 7 kDa DNA-binding proteins 
from S. acidocaldarius. 

Stokes radii of gyration 

The degree of asymmetry and oligomerisation 
of proteins are easily determined by analytical gel 
filtration [17]. This procedure allows the use of 
low protein concentration in order to avoid 
artefacts such as protein aggregation. The relation 
between the Stokes radius, R s , and the quaternary 



Fig. 1. (a) Separation of the DNA-binding proteins on CM- 
Sepharose CL-6B. Pooled fractions for protein 7d and an 8 
kDa protein are marked. The NaCl concentration was in- 
creased from 0.40 M to 0.49 M in phosphate buffer (pH 6.0) 
within the marked region, (b) Protein 7d derived from S. 
solfataricus (this paper) in comparison to 7 kDa proteins from 
S. acidocaldarius (this paper and Ref. 15). Lanes 1 and 6 show 
TP 50 marker proteins from 5. solfataricus; lane 2, protein 7b 
from S. acidocaldarius; lane 3, protein 7c from S. 
acidocaldarius; lane 4, protein 7d from 5, solfataricus; lane 5, 
protein 7d from S. acidocaldarius. 
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TABLE I 

THE STOKES RADII OF GYRATION OF THE 7 kDa 
nWA-BlNDING PROTEINS FROM 5. ACIDO- 
CALVARJUS * AND S. SOLFATARICUS b DETERMINED 
gy ANALYTICAL GEL FILTRATION [17]. 
T^c friciional ratio (/// 0 ) is calculated from the ratio of R % 
and the radius of the equivalent sphere R ^ 

" R s (nm) f/f 0 

monomer dimer tetramer 

7^6U L53 L20 095 0J5 

JbJ S so7d 1.56 1.21 0-96 0-73 

structure of proteins is the frictional ratio, /// 0 , 
which can be calculated from the experimental R s 
value and the theoretical minimal radius, R^, 
for 3 given molecular weight. Table 1 shows that 
in 0.35 M NaCl the 7 kDa proteins are monomers 
like the 7 kDa proteins from 5. acidocaldarius. 
This is also in accordance with results from H- 
NMR experiments (data not shown). 

Filter binding assays 

The original procedure [18] for filter binding 
assays used rather low ionic strength buffer (0.1 X 
SSC), which allows the nonspecific binding of 
basic proteins to nucleic acids by electrostatic 
interactions. In order to avoid this, the NaCl 
concentration of the binding buffer was increased 
to 0.25 M in 0.1 X SSC. It has been shown that at 
this ionic strength, basic proteins like lysozyme, 
cytochrome c or £. coli ribosomal proteins do not 
bind to DNA due to their basicity only [13]. Well 
established DNA-binding proteins like HU from 
E. coli and DNA-binding protein II from Bacillus 
stearothermophiius showed with these buffer con- 
ditions a binding capacity of 18% to 20% at a 
protein/DNA ratio of 25. The whole set of 



DNA-binding proteins from S. acidocaldarius 
clearly demonstrated binding capacities in the 
range of 5% to nearly 80% under the same condi- 
tions [12-14]. The filter binding assay of protein 
7d (Table II) resulted in a DNA-binding affinity 
of about 18% binding capacity referring to the 
100% sample of [ 3 H]DNA without protein content 
at a protein/DNA ratio of 25. This value is slightly 
higher than that of the homologous protein from 
S. acidocaldarius, which can be explained by the 
different amount of methylated lysines. 

The results of the size exclusion experiments 
confirm qualitatively those from filter binding as- 
says. If the protein/DNA ratio is increased drasti- 
cally, free protein is fractionated by the Sephadex- 
G50 superfine column after the void volume peak, 
which contained the protein/DNA complex. The 
same results were obtained using either Sulfolobus 
or E. coli DNA. In the latter case, incubation 
temperature was decreased to 37 ° C 

Electron microscopy 

Fig. 2 presents the electron micrographs of 
protein 7d in complexed formation with both dou- 
ble- and single-stranded DNA. The formation of 
the protein-DNA complex results in highly con- 
densed DNA-protein clusters. With increased pro- 
tein/DNA ratios, the isolated clusters on the DNA 
merge more and more into a large central pro- 
tein/DNA cluster, surrounded by loops of free 
DNA. A preference for single- or double-stranded 
DNA was not found. Similar structures have been 
observed for the 7 kDa proteins from S. acidocal- 
darius, which represent a very homogeneous group 
of five DNA-binding proteins [14,15]. All these 
highly similar proteins have been shown to inter- 
act specifically with single- and double-stranded 
DNA, although a sequence specificity has not 
been observed [19]. 



TABLE 11 

MILLIPORE FILTER BINDING ASSAYS 



Increasing amounts of protein were incubated with 0.5 ,g ^-labeled DNA in the presence of 0.25 M 

DNA-binding affinity of protein 7d from S. soifataricus is shown. 100% affinity is. equivalent to the total amount of [ HJDNA. 

Protein/DNA ratio (w/w) 1 5 10 15 20 25 
DNA-binding affinity (%) 1 6 10 ^ ^ 1! 
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Fig. 2. Electron micrographs of nucleoproteins formed with 
protein 7d. Some complexes formed with (ss) DNA (#X 174) 
are marked with arrows. Ousters of bound protein on (ds) 
plasmid DNA RSF 1010, surrounded by free DNA, could be 
observed. 



Amino-acid sequence determination 

The complete amino-acid sequence of protein 
7d from the archaebacterium S. solfataricus and 
the strategy employed for the sequence determina- 
tion are shown in Fig. 3. The amino-acid composi- 
tion derived from the sequence is in good agree- 
ment with that obtained from the total hydrolysis 
of the protein (Table III). As derived from the 
amino-acid sequence, protein 7d contains mod- 
ified lysines which were identified as monomethyl- 
ated residues partially modified at positions 4, 6, 
60, 62 and 63 and fully methylated at position 62 
(see below). 

Occurrence of modified amino acids in the protein 

In the PTH-amino acid identification system of 
the liquid [21,22] and gas-liquid phase sequenator 
[23], a new peak was observed in steps 4, 6, 60, 62 
and 63. This modified derivative was identified 
on-line as e-monomethyl-PTH lysine in compari- 
son with an authentic reference. 



SEQ 
TRY 

CB 



SEQ 
TRY 



CB 



SEQ 
TRY 

CB 



5 10 15 20 

Ala-Thr-VQl-Lys*-Phe-Lys*-Tyr-Lys-Gly-Glu-Glu-Lys-Glu-VQl-Asp-He-Ser-Lys-lle-Lys-Lys- 

Ti ~~ ~~ ~. T2 ~ T3 ~~ , v, : , J5 

cbi 

25 30 35 W> ■ . 

Val-Trp-Arg-Val-Gly-Lys^ 

~~ ~~ ~~ Tb ^ w ~~ ~ ~~ ; 



TSa t T7a 

CB2 



o- t>- c=- c=- 



45 50 55 60 

Gly-Ala-\fal-Ser-Glulys-Asp-A^ 

Te ~~ ~* ~~ ~ ~* T9 ~~ T10 Til , 



CB3 



Fig. 3. Amino-acid sequence of DNA-binding protein 7d from 5. solfataricus. Sequences of individual peptides and intact protein axe 
indicated as follows: -♦, Sequenced automatically using a pulsed gas-liquid phase sequencer [23], or a liquid-phase sequencer 
[20-22J. Manual liquid-phase DABITC/PITC double coupling method 124,25]. t>. Solid-phase sequencing after homoserine-lac- 
tone aitachmeni to amijiopropyl glass (APG) [27,28]. TRY and CB indicate peptides derived from digestion with trypsin or cleavage 

with GNBr. Lys* indicates the W-monomethylated lysines. 
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Furthermore, experiments with lysine deriva- 
tives showed that this unusual amino acid comig- 
rates with the authentic o-phthaldialdehyde de- 
rivative of e-monomethyl lysine in the ami no-acid 
analyzer [15]. Fig. 4 shows the HPLC separation 




TABLE III 

AMINO-ACID ANALYSIS OF THE DNA-BINDING PRO- 
TEIN 7d FROM S. SOLFATARJCVS 

n.d., residues not determined by amino-acid analysis. 





Number of residues derived by amino-acid: 


sequence 


analysis * 


Asp 


i 

j 


2.6 


Asn 






Glu 


1 


9.0 


Gin 


2 




Ser 


3 


2.4 


Gly 


7 


7.6 


Thr 


3 


2.4 


Arg 


2 


2.3 


Ala 


3 


3.0 


Tyr 


2 


1.7 


Trp 


1 


n.d. 


Met 


2 


1.2 


VaJ 


5 


5.6 


Phe 


2 


1.6 


He 


3 


2.9 


Leu 


3 


3.1 


Lys b 


14 


12.6 


Pro 


1 


n.d. 



a The values given are not corrected for destruction of amino 

acids or incomplete hydrolysis. ■ 
b Lys refers to the' sum of lysine and monomelhylated lysine. 

Due to the presence of incompletely modified lysines, the 

value for lysines by amino-acid analysis cannot be calculated 

precisely. 

of a standard amino-acid mixture plus e-mono- 
methylated lysine after o-phthaldialdehyde deriva- 
tization. The additional peak which migrates be- 



Fig. 4. (a) Separation of 100 pmol of a reference amino-acid 
mixture containing W-monomethylated lysine, after ortho- 
phthaldialdehyde precolumn derivauzaUon, by reversed-phase 
HPLC, using a column (250x4 mm) filled with Shandon 
Hypersil ODS 5^ material Buffer A was 12.5 mM Na 2 HP0 4 
(pH 7.2), and buffer B was 3% tetrahydrofuran in methanol 
[27]. The peak which appears between threonine and arginine 
co mi grates with authentic e-monomethylated lysine (K*). (b) 
The amino-acid composition of protein Sso7d after total hy- 
drolysis. The separation of the amino acids was as described in 
Fig. 4a. The characteristic peak for JV'-monomethyl lysine 
(K.*) appears at the same position in the chromatograro. (c) 
The amino-acid composition of the C-ierminal peptide <CB 3) 
after acid hydrolysis. The separation of the amino acids was as 
described in Fig. 4a, The peak marked with an asterisk shows 
the e-monomethylated lysine residue. 
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tween threonine and arginine derivatives was de- 
termined to be e-monomethyllysine, whereas £-di- 
methyllysine migrated after the arginine derivative 
and Mrimethyllysine before glycine. 

Fig. 4b shows the separation of the amino-acid 
derivatives of protein 7d produced after amino- 
acid hydrolysis. Between the arginine and 
threonine o-phthaldi aldehyde derivatives, the e- 
monornethyllysine of the hydrolysate of the 
DNA-binding protein 7d can be identified. 

Separation of tryptic peptides and N -terminal se- 
quence region 

Fig. 5 demonstrates the separation of the tryptic 
peptides by RP-HPLC with a Vydac C lg column. 
Some peptides with the same amino-acid composi- 
tion except for the lysine content elute at different 
retention times. This effect is probably caused by 
the different degree of methylation of lysine re- 
sidues. Sequence information and o-phthaldialde- 
hyde-amino-acid determination demonstrates that 
the peptides Tl 2 and Tl 4 have Lys-4 modified, 
with the sequence Ala-Thr-Val-Lys* (pos. 1-4, 
see Fig. 3), while peptide Tlj contains an un- 
modified lysine residues with the sequence Ala- 
Thr-Val-Lys. Peptide Tl 3 is a mixture of the 
peptides Tl x and Tl 2 . Peptide T2, Phe-Lys* (pos. 
5-6, see Fig. 3) is found in one position only. The 
degree of methylation, derived from the sequence 



of the intact protein and estimated by peak height, 
is approx. 90% for Lys-4 and 83% for Lys-6. 

The appearance of peptide T7 (pos. 28-39), 
which does not possess modified lysines, at two 
different positions may be due to partial oxidation 
of methionine. The degree of modification at Lys- 
60 appears to be the crucial factor for the elution 
of peptide T10 (pos. 52-60) at different positions. 
Amino-acid analysis of this peptide has shown 
that peptides TlOj and TL0 2 differ only at Lys-60, 
namely T10 1 contains unmodified lysine, while 
Lys-60 in T10 2 is monomethylated. 

C'terminal peptide regions 

The peptides produced after CNBr cleavage 
were separated by RP-HPLC either on a Vydac C 4 
or C 18 column as described in Experimental pro- 
cedures. The C-terminal peptide. (CB3) (pos. 
58-63) was isolated by using the Vydac C l8 col- 
umn and the homoserine peptides CB1 (pos. 1-28) 
and CB2 (pos. 29-57) by a Vydac C 4 column. 
From the sequence determination and amino-acid 
analysis (Fig. 4c) of CB3, the following primary 
structure was derived: 58-Leu-Glu-Lys*-Gln- 
Lys*-Lys*-63. The degree of monomethylation, as 
estimated by peak height, is approx. 90%, 100% 
and 58% for lysine residues 60, 62 and 63, respec- 
tively. The number of lysine residues in the C- 
terminal peptide was substantiated by fast atom 
bombardment mass spectrometry [30]. 




fractions 



Fig. 5. Separation of the 20 nmol peptides derived by tryptic digestion of protein Sso7d by HPLC The peptides wer 
chromatographed on a Vydac C, 8 (201 TPB) column (250 x 4 mm) in a solvent system of 0.1% trifluoroacetic arid/acetonitrile, Tt 
gradient applied was 100% A for 10 min, 0-50% B in 180 min, 50-100% B in 20 min, 100% B for 5 min and 100-0% B in 5 ttvi 
Measurements were made at 220 nra, 0.16 arbitrary units (full scale), at a flow rate of 1 .0 ml/min. 
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Because of the methylation of the lysines found 
here in the S. solfataricus 7d protein (Sso7d), the 
homologous 7d protein derived from S. acidocal- 
darius (Sac7d) was also examined for lysine mod- 
ifications not previously identified [14]. We rein- 
vestigated the Sac7d protein by liquid phase se- 
quencing and isolation of the C-terminal CNBr 
fragment, and found A^-monomethylated lysines 
at positions 4 and 6 (approx. 20% and 50%, re- 
spectively). However, in contrast to Sso7d, the 



corresponding Sac7d protein showed no modified 
lysine residues in the C-terminal sequence region. 

Secondary structure predictions 

Information about the secondary structure of 
protein 7d has been predicted based on the 
amino-acid sequence. Four different prediction 
methods according to Ref. 31 were used to calcu- 
late the conformational states (Fig. 6). This pro- 
tein possesses a higher amount of or-helical do- 
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Fig. 6. Secondary structure of DNA-binding proteins 7d from S. acidocoldanus and S. solfaiaricus as predicted by four different 

methods. The symbols represent residues in a-helicaJ ( AUk), beta-sheet (^ A/ ), ^-turns (rvru) and random coil ( ) 

formations. The line Avg summarizes the secondary structure obtained when at least three of the four predictions are in agreement 
The amino-acid sequences of the proteins are shown at the bottom line in the one-letter code. Sch, method according to Burgess et al. 
[33]; C&F, Chou and Fassman [34]; Nag, Nagano [35); Rob, Robson and Suzuki [36]. 
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Fig. 7. Structural homology between the 7d DNA-binding protein from S. solfataricus and the UNA- binding proteins 7a, 7b, 7e and 
7d from S. acidocaldarius cells. The alignment scores (SD units) calculated by the program ALIGN [32] using the standard mutation 
data matrix (100 random runs and a break penalty of 20) are: 

7d 5. solfataricus - 7a S. acidocaldarius: 30.93. 7d $. solfataricus - 7d S. acidocaldarius: 32.63. 
7d S. solfataricus - 7b 5. acidocaldarius: 29.54. 7d S. solfataricus - 7e S. acidocaldarius: 30.23. 

Gaps are shown as ... . 



mains - about 35% - as compared to other 7 kDa 
DNA-binding proteins from S. acidocaldarius for 
which only about 15% helix content was calcu- 
lated. 

Homology to other DNA -binding proteins 

By sequence comparison, we found a strong 
degree of homology between protein 7d from S. 
solfataricus and the proteins from the 7 kDa group 
from the archaebacterium S. acidocaldarius (Fig. 
7), using the programme ALIGN [32]. No signifi- 
cant homology between protein 7d from S. solfa- 
taricus and DNA-binding proteins from other 
organisms has been found. 
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The crystal structure of 
the hyperthermophile 
chromosomal protein Sso7d 
bound to DNA 



Sso7d and Sac7d are two small (-7,000 Af t ), but abundant, chro- 
mosomal proteins from the hyperthermophilic archaeabacteria 
Sulfolobus solfatarictts and 5. acidocaldarius respectively. These 
proteins have high thermal, acid and chemical stability. They 
bind DNA without marked sequence preference and increase 
the T m of DNA by -40 °C Sso7d in complex with GTAATTAC 
and GCGT^CGC + GCGAACGC was crystallized in different 



crystal lattices and the crystal structures were solved at high res- 
olution. Sso7d binds in the minor groove of DNA and causes a 
single-step sharp kink in DNA (-60°) by the intercalation of the 
hydrophobic side chains of Val 26 and Met 29. The intercalation 
sites are different in the two complexes. Observations of this 
novel DNA binding mode in three independent crystal lattices 
indicate that it is not a function of crystal packing. 

How do sequence-general DNA binding proteins bind to DNA 
is a fundamental question for understanding genome structure. 
However, few examples of structures of sequence -general DNA 
binding proteins bound to DNA are known. The high thermal, 
acid and chemical stability associated with Sso7d and Sac7d J (Fig. 
\a) makes them an attractive system for structural, thermody- 
namic and DNA-binding studies 2 " 5 . Sac7d and Sso7d have 
unfolding temperatures of greater than 90 °C (at pH 7.5, 03 M 
KCI) and both are acid stable with T m s of >60 °C at pH 0. The 



Sso7d 
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— P 1— -p2— (53 1 < (54 



Sac7d -VK- 



40 50 60 

SS07d BGGGK TGRGA VSEKD APKKL LQMLB KQKK 

P5 — a-Helix — I 
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Fig. 1 a, Amino acid sequences of recombinant 
Sso7d and Sac7d. b. Ribbon diagram of the 
Sso7d-GCGT0U)CGC + GCGAACGC complex. All side 
chains of Sso7d are shown. The four bridging water 
molecules are shown as large purple spheres. DNA is 
colored red for the first two base pairs and green the 
remaining six base pairs, separated by the intercalating 
amino acids (yellow), c. Superposition of three Sso7d 
structures from the Sso7d-GCGT( i U)CGC + GCGAACGC 
complex (yellow), the Sac7d-GCGATCGC complex 8 
(green) and the NMR solution structure 6 (red). 
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Fig. 2 a. Stereoscopic surface drawing of the elec- 
trostatic potential of the Sso7d-GCGT('U)CGC + 
GCGAACGC complex. The charge distribution of 
Sac7d was calculated in the absence of DNA. 
Sso7d is positively charged (+6), resulting from 14 
lysines, two arginines, seven glutamic acids and 
three aspartic acids on the protein surface. 
However, the complexes are negatively-charged 
(-8) overall due to the additional 14 negative 
DNA phosphate charges. There is no apparent 
correlation between the monomethylation sites 
of lysines (Lys 5 and Lys 7) and the binding inter- 
face. Four bridging waters are found in the space 
between the protein and DNA. b, c. Detailed 
views at the protein-DNA interface of the 
Sso7d-<5CGTCU)CGC + GCGAACGC Geft) and 
Sso7d-GTAATTAC (right) complexes. Selected 
side chains of Sso7d (red), three DNA base pairs 
(green) and four bridging water molecules (pur- 
ple) are shown. 



solurion structures of Sso7d 6 and Sac7d 7 , determined by NMR 
analyses, are similar to each other and they consist of an incom- 
plete five-stranded p-barrel capped by an amphiphiiic a-helix 
abutting the (33-1*4-05 strands. 

Both proteins bind to DNA without marked sequence prefer- 
ence and increase the T m of DNA by -40 °C\ However their 
DNA-binding mode has remained unclear until recently 8 . 
Baumann et al proposed that the P3-04-P5 sheet is the putative 
DNA binding surface 9 . McAfee et aL> have shown that Sac7d 
binds to DNA with an average ratio of four base pairs per 
monomer of Sac7d with no cooperativity. Circular dichroism 
data also indicated that Sac7d induces a sequence-dependent 
cooperative structural transition in DNA. Another unusual prop- 
erty is the ribonudease (RNase) activity associated with Sso7d, 
which has been called ribonudease P2 10 . However, similar studies 
on Sac7d did not produce condusive evidence of any RNase activ- 
ity (unpublished results of j.W.S.). 

We recently determined the crystal structures of two 
Sac7d-DNA complexes which revealed an unexpected DNA 
minor groove binding mode of Sac7d with the DNA duplex 
sharply kinked 8 . Here we present the results of a parallel study on 
the structure determination of two Sso7d-DNA complexes. The 
complexes were crystallized in two new crystal lattices which 
afford us an excellent opportunity to compare the structure and 
DNA binding properties of not only the same protein (Sso7d) in 
different environments, but also different proteins (that is, Sso7d 
versus Sac7d). The structures are also compared with a recent 
Sso7d-DNA structure by NMR analysis". 

Overall structure of the complex 

The crystal structures of two Sso7d-DNA complexes; 
Sso7d-GCGT(0J)CGC + GCGAACGC (U 5-iodo-deoxyuridine) 



and Sso7d-GTAATTAC have been solved and well-refined at high 
resolution (Table 1 ). All tyy angles of the Ramachandran plot and 
other conformational parameters in both complexes fall within the 
acceptable regions. The Sso7d binding sites in DNA are sharply 
kinked and located at different places in the two complexes: at the 
C2pG3 step in the Sso7d-GCGT('U)CGC + GCGAACGC com- 
plex (Fig. \b) and at the A3pA4 step in the Sso7d-GTAATTAC 
complex respectively. The protein covers four bases and signifi- 
cantly widens the DNA minor groove. The other end of the DNA 
duplex remains B-DNA-like. These two complexes have different 
crystal packing interactions, indicating that the observed novel 
DNA binding mode is not a result of crystal packing and is an 
accurate reflection of the preferred protein-DNA interaction. 

The structures of the bound Sso7d in both complexes are very 
similar to each other with an r.m.s.d. of 0.51 A (using Ca atoms of 
residues 2-60) and are generally similar to that of the free Sso7d 
determined by 2D-NMR analysis 6 with an nm.s.d. of -2.10 A 
(using Ca atoms of residues 2-60). Some differences exist in the 
orientation of the Pl-f)2 P-hairpin and in the conformations of 
the C-terminal a-helix (Fig. lc). 

The molecular surface of Sso7d is irregular with numerous 
ridges and valleys (Fig. 2a). The excellent matching of shapes and 
charges between Sso7d and DNA in the complexes is evident. A 
long groove is visible which is occupied by DNA in the complex. 
There is also a significant crater created by the crossing of the P3- 
P4-p5 triple stranded P-sheet and the pl-p2 P-hairpin. 

Sso7d has a OB-fold topology 12 , with a small hydrophobic core 
of only 11 residues (<25% solvent accessibility). Four glycines 
(Gly 10, 27, 38 and 39) are located in the loop regions. Many 
hydrophobic amino acids are solvent exposed (>45% solvent 
accessibility). The surface hydrophobic amino acids Trp 24, Val 
26, Met 29, and Ala 45 are involved in DNA binding contacts. 
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Fig. 3 a. Detailed local structures 
at the protein-DNA interface 
of the Sso7d- GCGT( j U)CGC + 
GCGAACGC complex. Selected 
side chains of Sso7d are shown. 
6, Schematic diagram summariz- 
ing all the important Sso7d-ONA 
contacts. The filled, open and 
dashed arrows represent direct 
hydrogen bonds/salt bridges, van 
der Waals close contacts, and 
potential hydrogen bonds/salt 
bridges respectively. 



There are two 3, 0 -turns that aUow the proteins main chain to 
change direction abruptly. The C-terminal helix is solvent 
exposed. A notable feature is the triple-stranded p-sheet (03-p4- 
P5) whose interactions with DNA are summarized in Fig. 3. 

Bound DNA has a sharp kink 

The DNA is severely kinked (Fig. 4) by the bound Sso7d, as in the 
Sac7d-DNA structures*. This type of DNA kink has been observed 
m the complexes of TBP"-" and LEF-1'*, SRY 16 (two HMG-box 
containing proteins) with their cognate specific DNA sequences, 
but different from that from proteins that bend DNA more 
smoothly 17 . The induced local DNA deformation is similar among 
different protein-DNA complexes, despite the different protein 
motifs. It should be noted that the -61° single step kink in the 
Sso7d-DNA and Sac7d-DNA complexes is the largest among all 
known structures of protein-DNA complexes. The solution struc- 
ture of the Sso7d-CTAGCGCGCTAG complex has been analyzed 
NMR recently 11 and the DNA was found to be bent by 30°, signifi- 
cantly lower than that found in the crystal structures. The differ- 
ence may be the result of the NMR refinement using limited 
number of observed NOE crosspeaks between Sso7d and DNA due 
to the fast exchange between the free and bound DNA/protein. 

The bound DNA has a varying degree of helix unwinding at steps 
surrounding the intercalation sites (-14° at C2pG3, -14° at G3pA4 




and -12° at T4piU5). There is also a slight roll (1 1°) between the 
G3-C14 and A4-T13 base pairs, thus creating a total bend of 72°. 
Many nucleotides surrounding the wedge site adopt the less-com- 
mon C3'-endo (N-type) sugar puckers: C2 (N), G3 (S), T4 (N) and 
'U5 (N) in one strand and G15 (S), C14 (N), A13 (N) and A12 (S) 
in the other strand. The Sso7d-GTAATTAC complex has the same 
pattern. 

The DNA distortion seen in the complex described here most 
likely represents the structural transition observed by McAfee etal? 
using CD spectroscopy for the Sac7d system and the large heat 
capacity change upon DNA binding observed by Lundback etal\ 
Such a structural transition (unwinding and/or bending) is 
induced in DNA by Sac7d which is cooperative' in the sense that it 
is necessary to have two proteins bound within a specified distance 
(for example, 5 base pairs in duplex poIy(dA-dT)) before the tran- 
sition occurs. The inherent resistance to the transition is apparent- 
ly negligible in short DNA sequences. Our preliminary 1D-NMR 
titration of Sac7d to cisplatin-lesioned DNA indicates a tight bind- 
ing between Sac7d and the pre-kinked DNA, supporting the novel 
binding mode observed in the crystal structures (unpublished 
data). 

Protein-DNA interface 

The binding of Sso7d to the minor groove of DNA involves a large 



Fig. 4 Stereoscopic view of the interca- 
lation sites. The local structures of the 
two Sso-DNA complexes are superim- 
posed. The DNA octamer is kinked 
61° at the C2pG3 step in the 
Sso7d-GCGT( i U)CGC + GCGAACGC 
complex and 62* at the A3-A4 step in 
the Sso7d-GTAATTAC complex. The 
sharp kink is due to the intercalation of 
Val 26 and Met 29 amino acid side 
chains into DNA base pairs from the 
minor groove direction, widening the 
minor groove at this step. The inser- 
tions of [*4-Met 29 and 03-Val 26 amino 
acid side chains are -1.5 A deep. The 
side chain of Met 29 lies close to the 
base pair with the S-CH 3 moiety 
wedged between the CI 4 and G15 
bases. Similarly the side chain of Val 26 
is wedged between the C2 and G3 
bases, with each of the 6CH, groups 
pointing toward a base. 
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binding surface area of about 20 A x 
20 A (Fig. 2a). A significant compo- 
nent of the free energy of binding is 
due to non-electrostatic interactions, 
made in large part by Trp 24, Val 26 
and Met 29 (Fig. 4). In addition to the 
obvious role of the (34 -Met 29 and p3- 
Val 26 amino acids, the single trypto- 
phan (p3-Trp 24) plays multiple roles. 
First its bulky ring fills up the space 
between DNA and Sso7d. Second its 
indole NH group forms a specific 
hydrogen bond (2.93 A) to the N3 of 
the G3 base, anchoring G3 in its open 
(unstacked) position. Ala 45 also 
makes a dose van der Waals contact 
with the deoxyribose of C14, suggest- 
ing the requirement of a small 
hydrophobic side chain of alanine at 
position 45. Ser 31 receives a hydrogen 
bond (2.87 A) from the N2 amino 
group of the G3 nucleotide. 
Interestingly, in the Sso7d-GTAAT- 
TAC complex Ser 3 1 forms a hydrogen 
bond to the sulfur atom of Met 29. 

The guanidinium group of Arg 43 
is hydrogen bonded to the 02 atom 
of ^5. Arg 43 is held in its place with 
the aid of Tyr 8 whose aromatic ring 
is stacked on the deoxyribose of A 1 3. 
The phenolic OH group of Tyr 8 is linked to the N3 of A13 
through a. bridging water. The hydrogen bond between Arg 43 and 
the 02 atom of a thymine appears to be important and may deter- 
mine the polarity of the Sso7d binding mode. The structure of the 
Sso7d-GCGT( i U)CGC + GCGAACGC complex shows that the 
Arg 43 of Sso7d is hydrogen bonded to j U5 of the TT-strand, not 
to the AA strand. Therefore a combination of the specific interac- 
tion between a guanine base and Ser 31, and the hydrogen bond 
between Arg 43 and '175-02, may be important in favoring the 
intercalation site at the C2pG3 step in this complex. 

The formation of the complex is accomplished by specific 
hydrogen bonds/salt bridges (Fig. 3). The number of salt bridges 
between the protein and DNA is in excellent agreement with the 
five ionic interactions predicted by the salt back- titrations of the 
Sac7d complex 3 using the theory of de Haseth et a/, 1 *. A some- 
what smaller value has been determined by salt-dependent 
isothermal titration calorimetry on the binding between Sso7d 
andpoly(dG-dC)<. 

An important question is how do Sso7d and Sac7d bind to 
DNA in a sequence-general manner. The answer may lie in the 
bridging water molecules found in the buried cavity located 
between protein and DNA (Fig. 2fc,c). This cavity permits the 
G-C base pairs to be bound without steric clash due to the addi- 
tional N2 amino groups, thus endowing the protein with a prop- 
erty required for its sequence-general binding to DNA. For 
example, in the Sso7d-GCGGTCGC + GCGACCGC complex 
(which has a G-C base pair, instead of an A-T base pair, at the 
fourth position in the sequence), we observed fewer intervening 
water molecules with a concomitant movement of DNA base 
pairs (unpublished results). It is interesting to note that bridging 
water molecules play an important role in modulating the 
sequence-general binding of Sac7d and Sso7d by acting as filler, 
whereas they play an entirely different role as specific linkers 



Table 1 Crystal and refinement data of two Sso7d-DNA complexes 





5so7d + 


l-dU-02 


l-dU-06 


5so7d + 




GTAATTAC 






GCGT( ; U)CGC 
+GCGAACGC 


Crystal data 










a (A) 


47.60 


47.52 


47.78 


46.87 


6(A) 


50.77 


50.76 


50.91 


49.67 


c(A) 


42.06 


41.97 


42.03 


37.65 


Resolution (A) 


2.0 


2.0 


2.0 


1.7 


# reflections (>1.0o(F)) 


7,607 


7,499 


7,669 


11,959 


Temperature (°C) 


20 


20 


20 


-150 


IWge (%) 


7.53 


6.37 


7.33 


7.37 


Completeness (%) 


94.1 


92.9 


95.7 


84.3 


Completeness at highest 










shell for >2.0 a(F) (%) 


83.0 (2.0-2.1 A) 






90.6 (1.70-1.78 A) 


Wilson B-factor (A 2 ) 


32.6 


29.7 


32.1 


17.8 


Mean overall 










figure of merit 


0.83 








Refinement data 










# reflection (>2.0 o(I)) 


5,682 






9.488 


RwoWR* M (10% data) 


0.168/0.268 






0.203/0.283 


R.m.s.d. bond distance (A) 


0.010/0.007 






0.014/0.009 


<Sso7a7DNA) 










R.m.s.d. bond angle O 


1.37/1.20 






1.81/1.34 


(Sso7d/DNA) 










No. of atoms 










(Sso7d/DNA) 


510/322 






502/322 


No. of waters 


99 






114 



between protein and DNA in defining the sequence specificity in 
the Trp repressor-DNA recognition 19 . 

Biological implication 

The structures of the Sso7d-DNA and Sac7d-DNA complexes 
offer new insights into the possible role of several classes of DNA 
binding proteins in transcription regulation. Some of those pro- 
teins, including TBP l3 » SRY 15 , LEF1 16 and PurR 20 , bind in the 
minor groove of DNA and kink the DNA duplex to a different 
degree 17 . Additionally we noted a possible structural alignment 
between Sso7d/Sac7d and the cold shock proteins 
CspA/CspB 21 * 22 . Both CspA and CspB are related to a class of pro- 
teins called Y-box proteins, which have a wide-spread and highly 
conserved nucleic acid -binding motif occurring from bacteria to 
humans 23 . Therefore this structural alignment between 
Sso7d7Sac7d and CspA/CspB may be significant in understand- 
ing the Y-box proteins. 

The new DNA binding mode of Sso7d/Sac7d may also offer a 
clue for understanding the packaging of DNA in archaeabacteria. 
Several models of the polymeric Sso7d-DNA complex with dif- 
ferent protein/DNA ratios can be constructed by using the struc- 
ture of the complex observed in the crystals. Previously we 
presented a model in which the DNA is maximally-loaded with 
Sac7d proteins 4 . Additional modeling studies showed that if the 
number of base pairs per protein monomer is increased (for 
example, to 10 base pairs per protein), many possibilities for 
DNA condensation may exist (data not shown) 

Our study augments the understanding of chromatin struc- 
ture in achaea. On the one hand, histones 24 or histone-like pro- 
teins (for example, HMf) 25 form nucleosomes. On the other 
hand, Sso7d/Sac7d may bind to DNA in the minor groove and 
form higher ordered structures. Thus two different types of DNA 
compaction mechanisms may be possible in the Archaea: the 
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mechanism described here with Sac7d/Sso7d which may be rep- 
resentative of the Crcnarchaeota, and a nucleosome-Iike struc- 
ture for the HMf-class of proteins found in the Euryarchaeota 2 " 7 . 

Interestingly, the bacterial HU protein has a different way of 
forming chomatin structure. The crystal structure of the complex 
between the integration host factor (IHF) and DNA revealed that 
I HF induces two prominent kinks in the bound DNA, forming a U- 
turn :s , by the partial intercalation of a proline from each of the two 
long p-hairpins which wrap around the DNA. The sequence and 
structural homology between IHF and HU suggest that HU may 
organize chromatin using a minor-groove binding mode through 
intercalation. 

Methods 

The purified Sso7d protein? was diatyzed against de-ionized water and 
lyophihzed. The complexes were crystallized using the vapor diffusion 
method. The Sso7d-<3TAATTAC complex and the two iodo derivatives 
were crystallized from 1 .3 mM Sso7d, 1 .3 mM DNA duplex, 2 mM Tris CI 
buffer <pH 6.5), 2.6% PEG400 solution, equilibrated with 15% PEG400 
The Sso7d-(GCGTTCGC + GCGAACGC) and iodo complexes were crys- 
tallized under stm.lar conditions except 5% 2-methyl-2,4-pentanediol 
(MPD) solution was used and the solution equilibrated with 20% MPD 
Data were collected either at room temperature (20 °0 or at -1 50 °C on 
a Rigaku R-Axis lie image plate area detector system to various resolu- 
t.on ranges (Table 1). The crystals of both complexes are in the space 
group P2,2,2,. The data were processed using the software provided by 
Molecular Structure Corporation. 

The phases were determined by the multiple isomorphous replace- 
ment (MIR) method using two iodo derivatives (denoted as l-dU-02 and 
WU-05 with the iodine located at positions T2 and T5 respectively) for 
the Sso7<M3TAATTAC complex. The figure-of -merit weighted MIR map 
with solvent flattening at 2.5 A resolution clearly revealed both the 
DNA and the Sso7d protein electron density. At that point the refined 
structure of the Sac7cM3TAATTAC complex* was used to model the 
Sso7 d-GTAATTAC complex into the MIR electron density. The model 
was appropriately corrected against the unbiased map. The structure 
was refined by the simulated annealing (SA) procedure incorporated in 
X-PLOR" using the data with |FJ > 4o(F) in the 6.0-1.9 A range 
Simulated annealing and individual temperature factor refinements 
were carried out by X-PLOR. Well-ordered water molecules were locat- 
ed and gradually included in the model. 

Crystals of the complex between Sso7d and GCGTTCGC + 
GCGAACGC and the iodo-dU derivatives were obtained. It was found 
that the sequence GCGT(iU)CGC + GCGAACGC produced the best crys- 
tals and a 1 .6 A resolution data set was collected at -150 °C The struc- 
ture of the complex was solved by the molecular replacement method 
using the AMORE package in the CCP4 suite*. Similar SA refinement 
was earned out with a final R-factor (working set) of 20.3% usinq the 
JFJ > 4c(F) data in the 6.0-1 .6 A range. 

Programs O", MIDAS Plus (University of California at San Francisco) 
and QUANTA (version 4.0. Molecular Simulation, Burlington, MA) were 
used to examine the electron density maps and molecular models. The 
electrostatic potential diagram was calculated by GRASP 32 . DNA force 
field parameters of Parkinson ef a/ » were used. All structures have 
been refined by SA and individual B-factor refinement in X-PLOR. 
During the refinement, some rebuilding of the model was necessary to 
improve the fitting of the model to the electron density. The crystal 
data and refinement summaries are listed in Table t. 



Coordinates. The atomic coordinates of the two Sso7d-DNA com- 
plexes have been deposited in the Brookhaven Protein Data Bank 
(accession numbers 1BN2 and 1BF4 respectively). 
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Solution structure and DNA-binding 
properties of a thermostable 
protein from the archaeon 
Sulfolobus solfataricus 



Herbert Baumann, Stefan Knapp, Thomas Lundback, Rudolf Ladenstein and 
Torleif Hard 

The archaeon Sulfolobus solfataricus expresses large amounts of a small basic 
protein, Sso7d, which was previously identified as a DNA-binding protein possibly 
involved in compaction of DNA. We have determined the solution structure of 
Sso7d. The protein consists of a triple-stranded anti-parallel p-sheet onto which an 
orthogonal double-stranded p-sheet is packed. This topology is very similar to that 
found in eukaryotic Src homology-3 (SH3) domains. Sso7d binds strongly (K < 10 
p.M) to double-stranded DNA and protects it from thermal denaturation. In " 
addition, we note that e-mono-methylation of lysine side chains of Sso7d is 
governed by cell growth temperatures, suggesting that methylation is related to 
the heat-shock response. 
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DNA in a random coil conformation occupies a volume 
that is almost always much larger than the cell in which 
the molecules are contained. I hus, the DNA of all cells 
must he structurally organized in a compact form and 
vet he readily availahle lor transcription. In the nucleus 
ol the eukaryotic ceils the genomic DNA is packed by hi- 
stone proteins into microsomes, which in turn form the 
higher -order struct u res of chromatin'. The struct ural or- 
ganization of DNA in prokaryots is somewhat less well 
understood". Archaea and bacteria contain ahundant 
small, basic proteins which are believed to be involved in 
packing and unpacking maintenance and control of the 
genomic I )\A (see refs 2-3 for reviews)— one of the best 
characterised bcingthe HU protein from Escherichiiuvli. 
Some of these proteins are also clearly evolutionary re- 
lated to eukaryotic histones (ref. 6 and work cited 
therein ). Others are believed to have more specialized 
functions, such as to bend the DNA at specific sites : . 

I hethermoacidophilic archaeon Sulfolobus, which can 
be isolated from volcanic hot springs", expresses several 
small, basic proteins. These proteins were first reported 
hy Thomm ct ui ( ref. 8 ) and were subsequently isolated, 
characterized and sequenced hy Rcinhardl and co-work- 
ers " '-. The basic proteins isolated from Sulfolobus 
ih ttlocMttriuscMt be grouped into three molecular weight 
classes of 7.000, 8,000 and 10,000 M/(7, 8 and 10 kDa), 
respectively - " , . The 7 kDa proteins can be further sepa- 
rated according to their basicity. Sequences are known 
lor the major component of the 7 kDa class in S. 



soljuhuitus (Sso7d)' ; ami the corresponding. protein 
(Sac/d) as well as three minor components ( Sac 7a, Sac7b, 
and Sac7e) in N. udilocttltlurius" 1 The sequences of these 
proteins are compared in Pig. I„. The proteins are all 
very rich in lysine residues — 1-1 residues out of 63 in 
Sso/d are lysines. Lysine residues at the aniino-and 
carboxy termini {residues I, (>, o(l. (>J ami in Sso7d) 
are subjected to t- mono- methylation within the cell" ,: . 

The function of the 7 kDa class of proteins in 
Sulfolobus is not known. The initial reports emphasize 
their DNA-binding properties. The proteins are small, 
basic and abundant, that is ' histone-like*. I tlter-binding 
assays show that Sso7d binds DNA at phvsiological salt 
concentrations and electron micrographs reveal the for- 
mation of compacted nuclcoproiein panicles with both 
double-stranded (dsl and single-stranded (s*; DNA i: . 
I he influence ol sequence specificity on Sso7d binding 
to dsDNA has not been investigated. The functional sig- 
mlkanee of e-mono-methylation of lysines or the effect 
ot various degrees of methylation on the DNA-binding 
properties are unclear. 

The Sso7/Sac7 class of proteins mav also have other 
functions in addition to DNA binding, for instance, the 
protein contains the sequence GGC.k'IV.RG (Fig. \a) f 
which is reminiscent of the 'P-loops* found in several 
classes of ATP- and GTP-binding proteins", ami might 
therefore be a phosphate binding site 1 

A protein in i". solfoturicus, which appears to be iden- 
tical to the previously identified Sso7d, has been sug- 
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Fig. 1 a, Aligned sequences of proteins of basic 7 kDa, 
proteins from S. Solfataricus (Sso7d) and 5. acidocaldariu's 
(Sac7a,b,d,e) u 1J . The numbering refers to Sso7d. Stars 
indicate lysine residues subjected toe-mono-methylation. 
The putative phosphate/nucleotide binding site in Sso7d 
has been boxed. Residue 13 in Sso7d has been changed 
from Glu" to Gin based on NMR data. b. Mass spectra of 
Sso7d from cultures grown at 75°C and 88°C The numbers 
indicate calculated masses for the various species. The 
expected mass for the non-methylated species calculated 
from the sequence is 7147.2 au. 



jested lo act as a rihonuclease albeit with a ralher nar- 
row substrate specificity' '.The protein — called p2 hv I usi 
(*/ (*/.' \ who compare p2 lo Sac7d, but seem to have been 
unaware of the published Sso7d sequence 1 ' — is reported 
to be dimcric tinder native condition**. This observation 
is in contrast to other data, which clearly show that Sso7d 
is monomeric ( rcf. 1.2, present work). 

The abundance of Sso7d in N. Si»//w/,/nVi/s. in combi- 
nation with its relatively small size, soluhililv, thermo- 
stability, and ease ol purification makes the protein suit- 
able lor biophysical analyses and structure determina- 
tion. We have initialed a series ot studies lo determine 
the structure and dynamics til the Sso7/Sac7 class of pro- 
teins, their nucleic acid -binding aflmit ies am) specifici- 
ties, as well as possible nucleotide bindini;/hydrolvsis. 
In the present work we report on the structure ol Sso7d 
in solution and provide a more detailed characterization 
ol its DNA-binding properties. When analvzing the 
structure ol Sso7d we made the intriguing observation 
that this abundant archaeal protein in fact is structur- 
ally similar lo that of SI 13 domains involved in signal 
transduction in eukaryote. We also note that the extent 
e-mono-mcthylation of lysine residues in Sso7d depends 
on cell culture growth lemperauuv, suggesting that the 
metbylation is a response lo heal shock. 

Purification and initial characterization 

Sso7d was purified from N. $olftittiricu< (Methods); the 
protein eluted in two peaks from the Mono S column 
used in the final purification stop. Mass spectrometric 
analysis of (pooled) material from the two peaks indi- 
cate the presence of six masses ( Fig. 1 />). Mass differences 
correspond to sequential substitutions of hydrogen at- 
oms with methyl groups, as a result of the f -mono-me- 
thylation of lysine residues described previously 1 1 1 \ The 
observation of six peaks with different methylation pat- 
terns is consistent with the notion that five lysine resi- 
dues are subjected to methylation. The mass of the spe- 
cies with the lowest molecular weight corresponds with 
that calculated from the sequence (Fig.tr?). 

Sso7d from the two fractions show NMR chemical 
shift differences of 0.02-0. 1 2 p.p.m. affecting backbone 
resonances of residues 2, 3, 6, 11, 12, 16, 17 and 44, but 
connectivities involving these residues observed in 2D 
NOESY spectra are practically identical for material from 
the two fractions. The chemical shift differences are most 
likely caused by electrostatic effects due to methylation 
of one of the lysine residues, because differences in 
chemical composition can be ruled out based on mass 
spectrometry. The presence of two exchanging conform- 
ers can also be ruled out because NOESY spectra re- 
corded on the two fsennrnted) *riecie* f snmnlpc 1 mrl % 
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see Methods) do not change within a period of seve 
months. 

The extent of e-mono-methylation of lysine sid 
chains varies with bacterial growth conditions so tr^ 
higher growth temperatures lead to more extensive me* 
thylation (Fig.lM. The physiological relevance of thi 
effect is not clear. It is possible that the lysine methyl^ 
tion is directly related to the stability of the protein 
and/or the DNA-protein complex and the response o{ 
the organism to heat shock. The pK t of the lysine side 
chain is affected very little by methylation'"and it seems 
less likely that methylation has a direct affect on DNA- 
binding affinity. 

Sso7d binds strongly to dsDNA 

The equilibrium binding of Sso7d to various polynucle- 
otides was studied by monitoring changes in the intrin- 
sic tryptophan fluorescence on formation of the com- 
plexes. The fluorescence of Ti p 23, excited at 290 nm, is 
quenched by 60-90% on binding and the emission spec- 
trum is shifted to longer wavelengths (not shown).The 
results of titrations performed at low salt (buffer D) and 
physiological salt concent ration (buffer C) conditions, 
respectively, are shown in Tig. 2,iJ>. Titration curves for 
lour dillerent dsDNA polymers with alternating purine- 
pyrimidine sequences at low salt, show an observed 

quenching, ( ) j; , which levels out at There is 

little dincrencc in I he apparent binding alfinitv to the 
various dsDNAs at low salt, presumably due lo quanti- 
tative binding to all DNAs. The binding curves show 
saturation at an approximate concentration ratio of 1:6 
protein:! )NA base pairs (bp), which can be taken as an 
estimate of the lower limit lor the Sso7d binding site 
density on I )NA. 

There is a definite difference between theSso7d bind- 
ing af Unities to various dsl >NA sequences at physiologi- 
cal salt concentrations ( I'ig. 2/0. The binding is stron- 
gest to polyldldC) and polytdAdU). for which the af- 
finities are approximately equal. The DNA concentra- 
tion at half saturation is in this case approximately 8jlM 
bp. This number corresponds lo an affinity constant of 
-0.5- 1 x 10" (M sites on DNA) 1 if one (conservatively) 
assumes that the maximum binding site density is in the 
range l:d-l:.t proteimDNA bp. Binding to poly(dGdC) 
is somewhat weaker and binding to poly(dAdT) is about 
5-10 times weaker than that io polv(dAdU) and 
poly(dldG). 

The binding affinities of Sso7d to various alternating 
dsDNA sequences can be rationalized as follows. First, a 

Fig. 2 Analysis of DNA binding by Sso7d. a, Equilibrium titrations of Sso7d with various polynucleotides and 
monod.nucleosrdes based on fractional fluorescence quenching {Q o J. The titrations are performed at low salt 
concentration (buff er D) as reverse titrations in which the protein concentration is kept constant (2l.M) b Equilibrium 
titrations performed at a higher salt concentration, which is closer to physiological conditions (buffer C) with 1 nM 
hIi mm H ■ T^w™ fe T *? I it ; ations with Poly(dGdC) (1). poly(dAdT) (Y), poly(dldC) (•). poly(dAdU) (A). 
poly(dA) U) poly dC) (□ , poly(rA) (,). pol y (rC) (+), dATP (ffl)and dCTP ( B ). The abscissa legends indicate that 
concentrations of double-stranded DNAs are measured in base pairs and concentrations of single-stranded 
polynucleotides and monodinucleosides are measured in bases, c. Thermal denaturation profiles of ooMdldC) in the 

^7^^ K P 7-1 nCe ^ c b °^ nd ^°!, d: n ° added Pr ° tein (c >- Sso7d added to a concentration corresponding to 1:15 
Sso7d:DNA bp (_;). and Sso7d added to a concentration corresponding to 1:3.6 Sso7d:DNA bp U) The poly(dldC) 
concentration was 12 »M bp. The thermal denaturation experiments were performed at low salt concentration 
conditions (butter E). 
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Fig. 3 a. Two-dimensional 500 MHz NOESY spectrum of 
Sso7d (concentration -2.5 mM in 90%:10% H 2 0:D O). b, 
Schematic view of the two antiparallel 0-sheets in Sso7d. 
Hydrogen bonds used in the SA simulations and observed 
NOEs are indicated with dashed lines and arrows, 
respectively. Additional hydrogen bonds, not used in SA, 

arp r|p<;rrihort in +ko <f>v+ 



methyl group nt position 5 (in the major groove) of the 
pyrimidine is unfavourable for binding. This is clear 
when comparing binding to poly(dAdU) and 
poly(dAdT). Thus, DNA-protein interactions mav oc- 
cur within the DNA major groove. Second, binding to 
dsDNA sequences with two inter-strand hydrogen bonds 
is stronger ihan to those with three hydrogen bonds in 
polymers lacking the pyrimidine methyl (that is, when 
eomparingpoly(ilAdD)and polyldldO to polv(dGdC)). 
Til is behaviour might be related to some physical prop- 
erty such as flexibility, considering thai Sso7d seems to 
induce condensation of DNA 1 -'. 

Titration curves for Sso7d binding to ssDNA and 
ssRNA bomopolymers in the presence of low salt con- 
centrations show saturation at I) ( =0.0-0.7. The bind- 
ing lo ssDNA and ssUNA under these conditions appear 
to be weaker than that to dsDNA, although there is a 
possibility that these complexes are as strong ;ts ihose 
with dsl )NA Inn that the maximum binding-site den- 
sity is lower. I lowever, the thermal denaluraTion studies 
described below indicate thai dsDNA is preferred over 
ssDNA, because the melting temperature increases on 
formation of the complex. 1 urlhermore, increasing the 
salt concentrations to physiological levels has a dramatic 
ellect on the binding to single-stranded polynucleotides 
( Fig. 2b). Under these conditions there is only very weak 
binding to poly(dA) and poly(dC), whereas no binding 
to poly(rA) and poly(rC) can be detected at polymer 
concentrations < 100 pM bases. Thus, there seems to be 
a large binding preference for dsDNA compared to 
ssDNA and ssRNA at higher salt concentration condi- 
tions. 

At low salt concentrations it is also possible to moni- 
tor binding of the monodeoxynucleosides dAIT and 
dCTP through the quenching of Trp 23 fluorescence (Fig. 
2a). The titration curves do not show saturation and it 
is difficult to estimate stoichiometrics and affinities based 
on the present data, but the binding seems to be weaker 
than that of the DNA and UNA polymers. 

Protection of DNA from denaturation 

Thermal denaturation profiles of double-stranded 
poly(dldC) in the absence and presence of bound Sso7d 
are shown in Fig. 2c. Poly(dldC) is thermally unstable 
above 32 C at the conditions used in the experiment 
shown in Fig. 2c. Addition of less than stoichiometric 
amounts of Sso7d increases the thermal stability of 
poly(dldC) yielding a biphasic DNA meltingcurvc.Satu- 
ration of poly(dldC) with bound Sso7d again results in 
a single phase denaturation profile with a melting tem- 
perature of about 70'C Thus, binding of Sso7d increases 
the melting temperature of poly(dldC) by more than 
38"C at low salt concentrations. Similar, albeit somewhat 
attenuated, effects can be observed with shorter DNA 
oligomers at physiological salt concentrations (data not 
shown). It is difficult to quantify the effect of Sso7d bind- 
ing to DNA polymers at high salt concentrations be- 
cause melting temperatures are high even in the absence 
of bound protein. However, it seems possible that Sso7d 
binding may shift the melting temperature of DNA above 
that of the boiling point of water. 



article 



The remarkable effect of Sso7d binding on DNA ther- 
mal stability is very similar to that of the HTa protein 
from ThermopUwna acidophihtni 17 . Stein and Searcy 17 
argue that the HTa protein may act to protect bacterial 
DNA during short periods of denaturing conditions al- 
lowing the organism to cope with transient periods of 
high temperatures.The Sso7d protein may function in a 
similar manner in Sulfolobtts. The different extent of 
lysine methylation of proteins expressed at different 
growth temperatures may also relate to the bacterial re- 
sponse to heat shock and stabilization of functionally 
important proteins. However, the effect of Sso7d me- 
thylation on its DNA-stabilizing properties are un- 
known. 

NMR structure determination 

Two-dimensional NMR spectra of Sso7d were recorded 
at 500 and W)() MHz. The 'H spectrum (Fig 3a) shows a 
very favourable resonance dispersion and could be al- 
most completely assigned using standard methodolo- 
gies'* '". Upon assigning the sequence we found one dis- 
agreement with the published sequence: residue 13, 
which is a C.lu in the sequence of Choli ct <i/.'-\ is in fact 
a C.ln and this correction has been made in Fig. I n. The 
'11 linewidths in Sso7d (3-8 Hz) are typical for a protein 
with a relative molecular mass of 7,000, indicating thai 
Sso7d is predominantly monomeric under the condi- 
tions used in the NMR experiments 

The NOFSY spectrum of Sso7d contains stretches ol 
verv sluing sequential *', Vli;|i NOH connectivities in 
combination with strong lonv; rant;e d and d v , 
NO lis. which are typical for |5-sheet secondary struc- 
tures^. These arise from one double-stranded and one 
triple-stranded anti-parallel [5 -sheet (Fig. 3//). The pat- 
tern ol intra- and in let- residue NOH connectivities, the 



observation of slowly exchanging backbone amide p ro 
tons and low amide temperature coefficients allowed th 
identification of 14 intramolecular backbone-backbo ne 
amide hydrogen bonds within the anti-parallel ^-sheets 
(Fig. 3b). 

The three-dimensional structure of a fragment con- 
taining residues 1-62 of Sso7d was calculated using a 
dynamic simulated annealing (SA) protocol with 617 
non-redundant NOE distance constraints, 11 ^' dihe- 
dral—angle constraints and 28 hydrogen bond distance 
constraints (two constraints per hydrogen bond), that 
is 10.6 constraints per residue. The NOE distances (d ; 
were distributed as 233 intraresidue (i=j), 151 sequel 
tial (li-jl=l),51 medium range (2<|i-j|<4), and 182 lons> 
range (li-jl >5) NOEs (Table 1 ). The quality of the com- 
puted SA structures is good as judged from the low 
Lennard-jones potential energies and the very small av- 
erage deviations from idealized geometries. The distance 
constraint violation statistics are also good: the average 
number of distance constraint violation >0.3 A is 0.2 
per structureand the largest violation found in any of 
the 35 structures is 0.3H A. The largest dihedral angle 
constraint violation is 3.2 . 

A plot of average backbone dihedral angles in the 35 
SA structures is shown in Fig. 4<j and plots of dihedral 
angle order parameters are shown in Fig. 4 b~d. Average 
backbone dihedrals are all within the allowed regions of 
a Ramachandran diagram ( not shown), except for those 
of Lys 8. The backbone of this residue is less well de- 
fined, as judged from the angular order parameters, 
which results in a sterically unfavourable geometric av- 
erage. The superimposed backbones of the final SA struc- 
Ui res are shown in stereo in Fig. The backbone con- 
format ion within the |5-sheet regions is well-defined, as 
indicated by atomic backbone root -mean-square devia- 
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Table 1 Structural Statistics for Sso7d* 

<SA> 



<SA> 



0.025 ±0.0018 
0.26 ±0.23 



0.20 
0.31 



R.m.s. deviation from experimental distance (A) 
and dihedral angle (cleg) restraints'* 
distance restraints (617) 
dihedral angle restraints (11) 

No. of violations* 

distance restraints (>0.3 A) 
dihedral angle restraints (>1°) 

E lt (kcal molT 

Deviations from idealized covalent geometry 
bonds (A) 
angles (deg) 
impropers (deg) 

■' The notation of the NMR structures is as follows: <SA> are the final 35 simulated annealing structures; ^ A ^«^|g) 
the mean structure obtained by averaging the coordinates of the individual SA structures best fit to each ot 
followed by minimization by restrained regularization. 

b The number of restraints is given in parentheses. - -h I SA 

' The maximum distance violation is 0.38 A and the maximum dihedral angle violation is 3.2° in an individua 
structure. 

a E t is the Lennard-Jones van der Waals* energy calculated with the CHARMM 47 force field. 



-172 ±20 



0.0025 ± 0.00016 
0.36 ±0.015 
0.24 ± 0.03 



0.024 
0 



0 
0 

-214 



0.0026 

0.36 

0.22 
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fig. 4 Average 
and y dihedral 
Ingles (a) and 
angular order 
parameters 
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lions of 0.510.1 A compared to the geometric average 
structure liable 2). Other regions are somewhat less welt 
defined, as indicated by an overall backbone r.m.s.d. of 
I.1+0.2A. The side chains of several residues in the hy- 
drophobic core of Sso7d are also well resolved, as can 
been seen in Fig. 5b. The C-tenninal fragment (residues 
46-60) is somewhat more well defined than the loop re- 
gions, with a backbone r.m.s.d. of 0.9±0.2 A, and a short 
a-helix including residues 52-59 is clearly discernible. 
This helix can also be deduced from a continuous stretch 
of strong sequential (/ NN (i t i+U and medium range 
J x (ij^3) and r/ (U+3) NOE connectivities. 

' The final set of SA structures contains several hydro- 
gen bonds, in addition to those vised in the structure 
calculations. These involve the backbone amide protons 
and carbonvl oxygens of residues 18 and 15, 19 and 15, 
20 and 32, 25 and 28, 27 and 25, 50 and 46, and 50 and 
47, respectively. 

The 5so7d structure 

Sso7d i> a globular protein. The tertiary fold consists ot 
a iriplc-Mranded anti-parallel (3-sheet, consisting of resi- 
dues 21-25. 2N-33 and 4 1-46 (strands III. IV and V. re- 
spectively }. onto which a double- stranded ji— sheet, made 
up ol residues 2-7 and 10-15 (strands 1 and II), is packed 
in an orthogonal manner. The hydrophobic core con- 
sisls of Mdc chains at the interface of the two sheets, in- 
cluding lho.se unhealed in Tig. 5/'. Strands I and 11 are 
connected through a lype II reverse turn with a hydro- 
gen bond between the carbonvl of Tyr 7 and the amide 
of t *.lu 10. Si rand II ends in one complete turn ot an a- 
helix involving residues 16-19, with a hydrogen bond 
between i he carbonvl of Asp 15 ami the amide ol lie 19. 
Strands HI and IV in the second |}-sheel are connected 
bv a ivpe 1 reverse turn involving residues 25-2S. Thus, 
hydrogen bonds between I he carbonvl of Val 25 and the 
amide of Mel 2S, and the amide of Val 25 and the carbo- 
nvl of Met 2S are present in the triple- stranded |3-sheet, 
in addition io those shown in I'ig. 3/>. Residues 35-40 
form a surface loop, containing I he glycine tripeplidc 
( '.Iv 3h-< ilv 37-t ilv 3K t l ig.M. The structure of this loop 
is not verv well de lined by the NMR constraints and it is 
clear thai ii can show a la ri;e decree of inherent flexibil- 



Table 2 Atomic r.m.s. difference statistics for the Sso7d structure* 



Comparison 



<SA> vs SA t 



SA vsSA,, 



Residues 



1- 60 
46-60 

2- 7,10-15,21-25. 
28-34.41-45 

1-60 



Backbone' 

1.08±0.17 
0.95±0.22 

0.54±0.09 

0.45 



All heavy atoms 

1.60±0.16 
1.72*0.28 

1.14±0.11 

0.80 



ity. Strand V (residues 41-46), ends in a complete turn 
of an a-helix involving residues 47-50. This short heli- 
cal segment is anchored through hydrophobic interac- 
tions involving Ala 50 and Pro 51. The backbone of the 
C-terminal fragment is not as well-defined as the $~ 
sheets, but residues 52-59 appear to form two turns of 
a-helix. This short helix is packed against the core 
through hydrophobic interactions between Leu 54 and 
Ala 50. 

The surface of Sso7d contains a hydrophobic deft and 
several exposed hydrophobic side chains (Fig. 6a). The 
hydrophobic cleft consists of the N-terminal Ala 1 side 
chain and the isoleucine residues lie 16 and He 19 on 
one'side', and the side chains of Pro 51 , Leu 54 and Met 
57 of the C-terminal helix on the other. The Trp 23 and 
Val 25 side chains of strand 111 are completely exposed 
to the solvent and so is the methyl of Ala 44. The side 
chains of Tyr 7 and Met 28 are partially exposed on the 
surface. 

The many basic lysine and arginine side chains are 
rather evenly distributed at the surface and the positive 
charges seem to be parlially compensated for by nearby 
acidic side chains. However, the lace of the triple- 
stranded |5-sheet appears to be predominantly positive 
in charge. This surface also contains the exposed Trp 23 
side chain: the fluorescence of I his residue is quenched 
by 90% upon formal ion ol a complex with dsDNA.Thus, 
l his face of the pro loin maybe the ON A binding surface. 

Sso7d and eukaryotic SH3 domains 

The topology ol Sso7d is very similar to that of eukary- 
otic Si 13 domains ( f ig. 7*/). The SI 13 domains are small 
protein modules! about (it) residues) which, together with 
SI 12 domains, are found in many proteins involved in 
signal transduction in eukaryote 1 '. The SH2 and SH3 
domains are commonly found in kinases or phospholi- 
pases, where they are believed to participate in protein- 
protein interactions. The structures ot SH3 domains 
from several proteins have recently been solved by both 
NMR spectroscopy and X-ray crystallography 21,22 . 

The minimized average structure of Sso7d is com- 
pared with the structures of the SI 13 domains of chicken 
brain a spectrin" ( IM>H entry ISIIC.) and human fyn 
proto-oncogcne :, ll , nB entry I SI il ) in Fig. 7 a and an 
alignment of the three sequences based on secondary 
structure and folding topology is shown in Fig. 7b. The 
super impositions included 3K ( '.a coordinates of the five 
13-strandsand a fragment from theC. terminus in Sso/d 
(residues 1-7, 10- 10. 2 1-25, 28-33, and 4 i -53; Fig. 7 
The r.m.s.ds with corresponding fragments in the £ 
spectrin and fyn SH3 domains are in both cases 3.3 
Thus, there is a good quantitative agreement betw <j e £ 
these structures. Differences are found at the N and ^ 
termini and for surface loops. In particular, the inter- 
connection between the B-st rands of the two SH3 ° 

i Sso/d 



-Notations correspond to those defined in Table 1 with the addition that SA jv 
is the non-minimized geometric average structure. Residues 61 and 62 are 
excluded from the comparison due to lack of structural constraints in this 
region. 

"Superimposed fragments. • 
'Atoms N, C and Cu. 
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mains which corresponds lo strands IV and V iru 
is extended into the putative P-loop in Sso7d (Fig- 

Comparison of the complete sequences of Sso 7(1 a j 
the SH3 domains does not reveal sequence hom °^,l 
However, homology can be inferred when conadciw j 
only the fragments for which there is structural sim , 
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it y, that is, when excluding loops and N and C termini, 
although any homology is still too weak to be conclu- 
sive by conventional alignment algorithms. Sequence 
identities and sequence similarities (aroma tic/ hydropho- 
bic residues) in the fragments that were used in the struc- 
tural alignment are shown in Fig. 7 b. It is worth noting 
that several residues which are well conserved among 
various SH3 domains" are present at the corresponding 
positions in Sso7d. These include Vol 3 in Sso7d (an ala- 
nine in SH3), Phc 5 and Tyr 7 (aromatics), Lys 1 2 (lysine), 
Val 22 and Tip 23 (hydrophobic), Met 28 and lie 29 (tryp- 
tophan and tryptophan/hydrophobic), Gly 43 (glycine), 
Ala 44 and Val 45 (aromatic or hydrophobic), and Ala 50 
( hydrophobic). Sso7d and SH3 domains are also similar 
in that they expose hydrophobic surfaces-" 1 . 

The possible origin and significance of the structural 



similarity between the Sso/d, which is an abundant pro- 
tein in the archaeon Sttlfolobti> % and the SH3 domains, 
which appear to have assumed highly specialized roles 
in signal transduction in eukaryotc, is unclear. One sce- 
nario may be that the fold has survived in all kingdoms 
due to its { thermal) stability and because it forms a suit- 
ably small and stable platform lor different functions in 
various organisms. An SH3-like fold has also recently 
been discovered lor a small protein in the photosystem 1 
complex (Psal;) in cvanobacteria r \ Structural similari- 
ties to SH3 has also been noted in another DNA-bind- 
ing protein: the bioiin biosynlhelic operon repressor 
(ifirA) in /■. coli : '\ 

Methods 

Protein purification. Suliolobus solfataricus tDSM 1617) isolated 
from vokanic hot springs in Italy - " was purchased Irom the 
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Fig. 5 a. Stereoview of superimposed backbone traces of residues 1-62 in Sso7d. For the sake of 
clarity, only 1 1 of the 35 SA structures are shown. The structures are superimposed to minimize 
r.m.s differences of backbone atoms in residues 1-60. N and C termini are coloured in blue and 
red, respectively The loop containing the putative phosphate/nucleotide binding site is coloured 
in green, b. Stereoview showing the resolution and packing of hydrophobic side chains in the 
protein core. The structures have in this case been superimposed to minimize r.m.s. deviations 
between heavy atoms of residues constituting the core. 
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g). Cultivation was performed aerobically at 7S\ "• 
jditional 10 gl*' saccharose in a membrane ferrr^ 
(Bioengineering). The cells were heat-shocked for 90 min at sg-V 
and harvested by centrifugation. Protein was also purified f r ~ 
cells that had not been subjected to heat shock, for compar^. 
of the extent of lysine methylation. c ' 
1 00 g cells were lysed in buffer A (10 mM Tris buffer. pH 8 8 
20 mM NaCI. 1 0% Glycerol) by passing the cell suspension throuc- 
a French press. The lysate was centrifuged to remove cell dem- 
and dialyzed against the same buffer. The cytosolic proteins we-- 
loaded onto a Mono Q (Pharmacia HR10/10) column equilibra^ 
with buffer A: Sso7 was found m the flow-through. This fracttc-- 
was concentrated in an Amicon stirred cell and applied in \ 5 r 
fractions to a Superose 6 column (90 x 1 .5 cm) equilibrated wit- 
30 mM Tris/HCland 200 mM NaCI at pH 7.4 . Fractions contains 
Sso7 were pooled, dialyzed against 50 mM potassium phosphai/ 
50 mM NaCI at pH 6.0, loaded onto a Mono S (Pharmacia HRic 
10) column equilibrated with the same buffer and eluted with - 
linear gradient of buffer B (50 mM potassium phosphate pH 8 ]■ • 
NaCI). Sso7d eluted at 25% B m two separate peaks, due to th- 
presence of differently methylated species of the protein. 
Sso7d concentrations were measured spectrophotometrically cr. 
a Cary 4E spectrophotometer using an extinction coefficierr 
calculated from tyrosine U\, 11>tt: = 1400 M ' cm* 1 ) and tryptopha- 
(t\„ yiitii = 5500 M 1 cm ! ) absorption '' 

NMR samples were prepared in 90%: 10% H : 0: D ; 0 or 100 c : 
D,0 with 20 mM potassium phosphate ipH 5 or 6), 50 mM NaC 
and 0.1% azide. The structure determination is based on data 
recorded on the following four NMR samples: 2.5 mM protein a; 
pH 6 containing materia! from both peaks eluted from the Monc 
S column.; -0.2 mM protein at pH 6 ( ontainmg material elutinc 
under peak 2; 1 mM protein .it pH 5 containing ;. material elutinc 
under peak 1 ; and 2 mM protein c onMining both i factions in D.C 
buffer at pH 6 (non-corrected pH meter reading). The first anc 
last samples contained two distinct NMR species. A combination 
of spectra collected on the second and thud samples corresponds 
to the NMR spectrum of sample 1 



Fig. 6 Space-filling model of Sso7d showing exposed hydrophobic 
(yellow) and aromatic (orange) side chains (tyrosine hydroxyls 
are also coloured in orange). The glycines in fragment 36-38 are 
coloured in green. The views in (a) and (b) are from opposite 
directions N and C termini are indicated in (a). 



Mass spectrometric analysis. M 

carried out at Pharmacia BioMiome 
VG Platform mass speclromrln hotn 
with an electrospray interlace, fhe 
mctlumol'.walcr (1:1) with 1 ,k rtu 
< 1700. where M is-thc nwiss and / 
and calibrated using horse heart 
standard. Uncertainties in molecul. 
approximately two mass units. 



iv> spectrometry (MS) was 
Center. Stockholm, using a 
I'tsons Instruments equipped 
mobile phase consisted of 
.uid. The lanne 700 <(M/z.' 
1:, the charoe was scannea 
myoglobin as a calibration 
ar mass determinations are 



Equilibrium titrations. The DNA and RNA polynucleotides used 
were purchased from Pharmacia and dissolved in 150 mM NaC; 
and 10 mM Tris/HCIat pH 7.4. Polynucleotide concentrations were 
determined spectrophotometrically using extinction coefficients 
given by Pharmacia. The deoxyiuicieosides ATP and CTP were 
purchased from 8oehringer-Mannheitn. 

Equilibrium titrations were carried out at 20*C in buffer C (100 
mM NaCI, 1 mM MgCI., 0.1 mM octaethylene glycol monododecy: 
ether (C,.E S ) and 20 mM Tns/HCI at pH 7.4) and in buffer D (0.5 
mM C.-E/and 20 mM Tris/HCI at pH 7.4), for which the pH 
measurements refer to 20°C . Titrations were performed as reverse 
titrations, in which different amounts of DNA/RNA were added 3t 
constant protein concentration (1 uM in buffer C and 2jiM ' n 
buffer 0). Steady-state fluorescence measurements were carnec 
out on a Shimad2u RF-5000 spectrofluorophotometer using the 
methodology and additional titration instrumentation recently 
described elsewhere- 1 . The excitation wavelength was 290 
and emission intensities were sampled at 0.2 nm intervals within 
the wavelength range 340-355 nm. Emission .pectra were 
recorded five times for each titration point in order to miniro |Ze 
effects of instrumental fluctuations. Measured fluorescence 
intensities were corrected for background emission by subtracting 
(small) signals from buffer samples and for optical filtering effects 
due to DNA absorption at 290 nm. 
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The fractional fluorescence quenching (Q^) was calculated as (l r> - 
0/1., where l 0 is the protein fluorescence intensity observed in the 
absence of DNA/RNA and I is the intensity in the presence of DNA/ 
RNA. Binding isotherms are presented as plots of Q otn against the 
logarithm of the basepair (dsDNA) or base (ssDNA, ssRNA and 
monodeoxynucleosides) concentration. 

DNA melting studies. Light absorption of poly(dldC) at 260 nm 
was measured as a function of temperature on a CARY 4E 
spectrophotometer, which allows the simultaneous measurement 
of up to six melting curves. The temperature was increased in 
steps of 1°C during a time period of 30 s, followed by a holding 
time of 60 s prior to absorbance measurements. The denaturation 
experiments were performed in 5 mM Tris/HC! at pH 7.0 (buffer 
E) with various concentrations of added Sso7d. 

NMR spectroscopy. NMR spectra were recorded on Varian Unity 
500 and 600 NMR spectrometers operating at magnetic fields of 
1 1 .74 and 14.09 T, respectively, and equipped with programmable 
pulse modulators and pulsed field gradient hardware. Spectra were 
recorded at 293, 303. 313 and 323 K. l H chemical shifts at 303 K 
(available from the authors) are referenced to H.O at 4.74 p.p.m.. 
Phase sensitive two-dimensional spectra were recorded in the 
hypercomplex mode.'". 



Two-dimensional homonudea: DOf -COSY''. NOESY\ and clean- 
TOCSY spectra" were lecoideci using spectral widths of 6,000 
Hz. 2*512 t. increments. 102-3 complex data points in the 
acquisition time domain and with 8-32 transients per t, increment. 
NOESY spectra weie recorded using cross relaxation mixing times 
of 60 or 200 ms and clean-TOCSY spectra were recorded using 
isotropic mixing times o\ 10. 60 or 80 ms. A 2D 'H.'-'C-HSQC 
spectrum was recorded using gradient selection ; - with a 'H and 
l K sweep widths of 6000 H: and 20000 Hz. respectively, 2*128 
t, increments. 512 compiex uaia points and 160 transients per 
increment. The HSpC sequence was optimized for a C-H scalar 
coupling constant of 140 Hr. \\;m the C transmitter placed at 
57 p.p.m.. 20 SS-NOESY spectra weie recorded with a sweep 
width of 8000 Hz and a 200 tr-s mixing time. The third pulse in 
the SS-NOESY sequence is a suited laminar pulse 3 * creating a 
zero net excitation at the iieouency of the transmitter (water 
resonance). Water suppression was achieved by presaturation of 
the water signal or presaturation ;n combination with SCUBA water 
suppression . No presat manor, was used m the HSQC 3nd SS- 
NOESY experiments. 

NMR spectra weie processed us ng software from Varian (VNMR) 
and/or Biosym Teclmolouie:. '^'^ 2.2 1 Data processing typically 
involved apodization with sh'Mecl Gaussian functions in the t . 
(acquisition tune) domain ,-.nu ^Dp/costne belt functions in t., and 
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Fig 7 a Comparison of folding topologies in Sso7d and SH3 domains. The stereo picture contains 
the superimposed backbones of Sso7d (grey), the SH3 domains of chicken brain a spectrin (green) 
and the human fyn proto-oncogene (blue), o. Secondary structure based alignment of the Sso7d 
sequence to those of the SH3 domains of chicken brain a spectrin (C spec a), and the human fyn 
proto-oncogene (H fyn). Elements of secondary structure in Sso7d are shown at the top. The 
numbering refers to the Sso7d sequence. The grey bars indicate fragments used in the structure- 
based alignment. Orange boxes indicate similar or identical hydrophobic residues within the 
aligned sequences. The blue and green boxes denote a lysine and a glycine which is located at 
identical positions in the aligned sequences. 
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baseline correction using routines available within the two 
software packages. Processed spectra typically contained 
1024x1024 real data points. 

NMR data analysis. Spin system identification and sequential 
resonance assignments of H resonances in Sso7d were carried 
out tn homonuclear 2D spectra using standard methodologies'' 5 ' 9 . 
The natural abundance 'KPH HSQC spectrum aided significantly 
when sorting out H methyl and aromatic resonances. Most 
assignment work and collection of NOE constraints were carried 
out on spectra recorded at 303 K. Analysis of NMR spectra and 
compilation of NOE data were performed using the interactive 
computer graphics program ANSIG-. 

Stereospecific assignments of prochiral methylene groups were 
earned out by identifying predominant x' rotamenc states using 

J „ coupling constants measured in DQF-COSY spectra and 

mtraresidue NOEs measured with a short (60 ms) mixing time ir . 

The relative magnitudes of V.,,,,,,, and 3 7 ^coupling constants 

couid also be measured m clean-TOCSY spectra recorded with a 
short (10 ms) mixing time using reported simulations 4 " as a 
teference for expected cross peak intensities. Valine methyl groups 
were stereospecifically assigned and x l rotamers from the 

magnitude of the J coupling and the relative intensities of 

mtraresidue c/ (f(<n NOE connecttvities".(note that the notations 
of valine yl and y2 methyls m ref. 41 are exchanged compared to 
convention) 

The / fotamenc states of Tlir 2 and Thr 32 were estimated as 

lollows. Both lesiduos have relatively small J ( coupling 

« onM.mts and the HN-Hu cross peaks in DQF-COSY are quadratic', 
ioc Ji< , if ii »f i ilui x =60 or x l = 1 SO. Inspection of the short mixing 

tunc NOESY specltum revealed that d ( > d in Thr 2, which 

r. lonsislcm withx - ISO, whereas d., tltl _> d t in Thr 32, which 
is ( oiiMMrni with y =60. 

NOEs wpic quantified as distance constraints based on cross peak 
volume*, measured m .i NOESY spectrum recorded with a mixing 
tune ot 60 ms The conversion of vofumos into distances was 
based on ( ,iiihr,nion of observed mtraresidue and sequential NOEs 
withm well-defined seqments of anti-parallel H-sheet'\ NOE 
volumes involving HN protons were corrected for the presence of 
10% 0 0 m the sample. Cross peak volumes involving methyl 
protons wen 1 divided by three prior to conversion into distance 
i onstr, nuts. Distance constraints were divided into four classes: 
stmuq (-2.7 A), medium <<3.3 A), weak (<5.0 A) and very weak 
Kb.O A) Psoudoatoms with appropriate distance corrections were 
ueated lot non-steieospeulic ally assigned methylene protons, 
.uom.itu Mm] protons and the methyl groups in leucines'". A 
uodmed) psetidoatom correction ot 0.3 A was used to account 
lot ellet ts due to i.ipid rotation ol methyl groups' 1 . 
A total of U hvdrogen -bonded amide protons could be identified 
either .is slowly exchanging resonances in a TOCSY spectrum of 
Sso/ii dissolved in D O, or as amide-proton resonances for which 
ih.e temper. mne dependence of the chemical shift is small (< S 
p.p. i) K ) tonipaied to that of C -terminal residues which are 
exposed m the solvent <> 8 p.p.b.K '). These experimentally 
suppoitcd hvdrogen bonds (between backbone amide protons 
rind carhonyl oxygens) were imposed in the structure calculations 
as 2S distance constraints with lower and upper bounds of 1 .8 A 
and 2.4 A for amide hydrogen to carbonyl oxygen distances, and 
2.6 A and 3.4 A (or amide nitrogen to carbonyt oxygen distances, 
respec lively. The hvdrogen bond constraints were imposed at a 
late stage of the structure refinement at which point hydrogen 
bond donor-acceptor pairs could be unambigously identified. All 
hvdrogen bonds used in the calculations are within well-defined 
regions ol anti-parallel p-sheet. A table of sequential assignments 
of the Sso7d H NMR spectrum at 30 n C and pH 6.0 is available 



from the authors on request. 

Structure Calculations. Three-dimensional structures w 
determined using a dynamic simulated annealing (SA) rrteth 
implemented within the X PLOR 3.0 program^. The protocol "* 
Nilges er a/. ja was used with some modifications, as describ^ ' 
below. Extended peptide conformations were used as stanin 

structures in the simulations. The X PLOR force field contain ^ 

potentials for chemical bonds, repulsive van der Waals' interaction" 
and experimental (distance and dihedral) constraints— was us= * 
The k ( constant of the distance constraint potential was set to SQ 
kcai mol"'A-' and the force constant of the dihedral (x 1 ) squa' 
. well potential was set to 200 kcal mol' 1 rad'. Force constants fevr 
planarity and chirality were set to 50 kcal mol-' rad' J Th e 
simulations were carried out in five stages: /, 100 steps Poweii 
energy minimization to remove bad non-bonded contacts* H 15 
ps of dynamics at 1000 K with normal van der Waals radii' and - 
fow repulsive force constant (0.002 kcal mol" 1 A 4 ); //;. 10 ps of 
1000 K dynamics during which the repulsive force constant wa* 
increased to 0. 1 kcal mor A and the assymptote in the NOE sort 
square well potential (constant c in ref. 44) was increased from 
0.0 to 1 .0 (in 1 0 steps); iv. cooling to 300K during 5.6 ps (28 steps 
of 0.2 ps with 25K cooling/step) with repulsive force constant ot 
4.0 kcal mol ' A-' and van der Waals' radii scaled by 0.8; and v 
1200 steps of Powell minimization with normal van der Waals 
radii and force constants for planarity and chirality set to 500 kcet 
mo! ■ rad A 1 fs time step was used throughout with bonds 
constrained using the SHAKE algorithm during stages i'-iv. 
An ensemble of structures was initially calculated after the 
sequential assignments were almost completed and about 300 
distance constraints had been collected. The simulations were then 
repeated several times duung sirm ture refinement. The final round 
ot SA contained 50 simulations out ol which 35 converged yietdina 
low energy structures. An aver.iqe SA structure (SA ) was 
calculated from the 35 SA structures by averaging superimposed 
coordinates. The average' structure was also minimized (SA 
usmg the same potential .is in stage v of the SA protocol. The 
structures were analyzed with respect to the precision of atomic 
positions and dihedral angles, constrain! violations, deviations from 
idealized bond geometries and non-bonded interaction potentials, 
and further charac teii/eci with respect to dihedral angle 
conformations and hydrogen bonding. Dihedral angle order 
parameters, S reflection the precision of the corresponding 
dihedral within the ensemble weie < alculaled according to Hyberts 
et a/. 1 ". A value of S"""*" approac hinij unity indicates a very well - 
defined dihedral ajujle whereas ,in isotropic distribution yields 
S" : ' =0 (but S ,: "'-'=0 must not necessarily reflect an isotropic 
distribution). The ensemble of SA structures were also searched 
for additional intermolecular hydrogen bonds using the following 
two criteria: the distance between the donor hydrogen and 
* acceptor oxygen and the two heavy atoms must be less than 2.5 
A and 3.5 A, respectively. Hydrogen bonds mentioned in the text 
fulfil these criteria in at least 18 of the 35 SA structures. Structural 
r.m.s. differences quoted in the text refer to comparisons with 
the average structure (SA ). it should be noted that r.m.s. 
difference comparisons containing a\\ atoms' can sometimes be 
erroneous and too targe due to the specific atom labelling of phenyl 
and tyrosyl rings and carboxylate groups. This is because the 
computer program evaluating r.m.s. differences does not always 
consider the inherent symmetry of these groups and therefore 
can give a large r.m.s. difference even in the case of perfect overlap 
*P. Krauhs. personal communication). 
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abstract: The genes for two Sac7 DNA-binding proteins, Sac7d and Sac7e, from the extremely 
thermophilic archaeon Sulfolobus acidocaldarius have been cloned into Escherichia coli and sequenced 
The sacld and sac7e open reading frames encode 66 amino acid (7608 Da) and 65 amino acid (7469 Da) 
proteins, respectively. Southern blots indicate that these are the only two Sac7 protein genes in S 
acidocaldarius, each present as a single copy. Sac7a, b, and c proteins appear to be carboxy-terminal 
modified Sac7d species. The transcription initiation and termination regions of the sac7d and sac7e genes 
have been identified along with the promoter elements. Potential ribosome binding sites have been 
identified downstream of the initiator codons. The sacld gene has been expressed in E. coli, and various 
physical properties of the recombinant protein have been compared with those of native Sac7 The UV 
absorbance spectra and extinction coefficients, the fluorescence excitation and emission spectra the circular 
dichroism, and the two-dimensional double-quantum filtered »H NMR spectra of the native and recombinant 
species are essentially identical, indicating essentially identical local and global folds. The recombinant 
and native proteins bind and stabilize double-stranded DNA with a site size of 3.5 base pairs and an 
intrinsic binding constant of 2 x 10 7 M"' for poly[dGdC]-poly[dGdC] in 0.01 M KH 2 P0 4 at pH 7.0. The 
availability of the recombinant protein permits a direct comparison of the thermal stabilities of the 
methylated and unmethylated forms of the protein. Differential scanning calorimetry demonstrates that 
the native protein is extremely thermostable and unfolds reversibly at pH 6.0 with a T m of approximately 
100 C, while the recombinant protein unfolds at 92.7 °C. 



Small basic DNA-binding proteins have been isolated from 
various archaea, some of which have been shown to be 
associated with the nucleoid or chromatin and presumably 
perform a histone-like or helix-stabilizing function in these 
onanisms (Searcy, 1975; Stein & Searcy, 1978; Searcy & 
Mange, 1980; Thomm et al., 1982; Grote et al., 1986; Lurz 
ctal., 1986; Choli et al., 1988a,b; Reddy & Suryanarayana, 
1989; Sandman et al., 1990), although the actual function 
of many of these proteins has not been demonstrated. HTa 
protein from the thermophilic archaeon Thermoplasma 
addophilum shows considerable homology to eukaryotic 
histones and Escherichia coli HU protein (Searcy, 1975; 
Searcy & Delange, 1980). Hmfl and HrriO, two DNA 
Hnding proteins from Methanothermus fervidus, are also 
homologous to some of the eukaryotic histones (Sandman 
ctal.. 1990). V 

Sulfolobus, a thermoacidophilic archaeon, expresses a 
number of small basic DNA-binding proteins ranging in 
molecular weight from 7000 to 10 000 (Kimura et al., 1984; 
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Grote et al., 1986; Choli et al., 1988a). These have no 
apparent homology to any of the histones. Much of the early 
work on these proteins resulted from a search for chromatin 
proteins that might stabilize the genomic DNA at the high _ 
growth temperature. Sulfolobus acidocaldarius grows op- 
timally in the range of 70-80 °C, while Sulfolobus solfa- 
taricus grows optimally at approximately 75-85 °C The 
G+C base composition of Sulfolobus DNA is about 40^, 
and its cellular salt concentration is relatively low, making 
a helix-stabilizing protein presumably necessary (Reddy & 
Suryanarayana, 1988). The 7 kDa class of proteins has been 
presented as a likely candidate given that they are present 
in relatively large amounts in the cell (Grote et al., 1986; 
Choli et al., 1988a,b). 

Five proteins have been isolated in the 7 kDa class from 
5. acidocaldarius (Kimura et al., 1984; Choli et al., 1988b), 
and have been labeled Sac^.a 1 through Sac7e, in order of " 
increasing basicity i Four of these, S?c7a, b, d t and e, have 
been sequenced (Figure 1) (Kimura et al., 1984,'tholi et 
a]., 1988b), and only minor differences among them have 
been noted. The sequence of Sac7c has not been reported. 
The number of genes encoding the 7 kDa proteins of S. 
acidocaldarius has not been determined Comparison of the 

'Abbreviations: DSM, Deutsche Sammlung fiir Mikroorganismen; 
JPTG, isopropyl y?-r>thiogalactopyranoside; NMR, nuclear magnetic 
resonance; COSY, correlation spectroscopy; DQF-COSY, double- 
quantum filtered correlation spectroscopy; DSC, differential scanning 
calorimetry; CD, circular dichroism; Sac7, a group of 7 kDa DNA- 
binding proteins from Sulfolobus acidocaldarius, individually referred 
to as Sac7a, Sac7b, Sac7c, Sac7d, and Sac7e, in order of increasing 
basicity; Sso7, a group of .7 kDa DNA-binding proteins torn Sulfolobus 
solfataricus. ■ .. v 
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amino acid sequences indicates that there must be at least 
two separate genes coding the 7d and 7e species. The high 
degree of similarity observed in the primary sequence of the 
7d and 7e proteins suggests that two genes arose through 
gene duplication. Sac7a and Sac7b are truncated versions 
of the Sac7d protein, most likely resulting from truncated 
genes, posttranslational processing, or degradation during 
isolation. 

Specific e-aminomonomethylation of lysines 4 and 6 is 
characteristic of the Sac7a, b, and d proteins, while Sac7e is 
monomethylated at lysines 6, 62, and 63 (residue 4 is an 
arginine in Sac7e) (Kimura et al., 1984; Choli et al. f 1988b). 
No lysine methylation has been detected in the C-terminus 
of Sac7a, b, or d, presumably since there are no lysines at 
positions 62 and 63 in these proteins, although Sac7d 
contains lysines at positions 64 and 65. The Sso7d protein 
from S. solfataricus is monomethylated at lysines 4 and 6 
and also at lysines 62, 64, and 65 (Choli et al,, 1988a). The 
role of lysine monomethylation has not been determined but 
is most likely nontrivial given the specificity (there are 12- 
14 lysines in these proteins) and the occurrence in both S. 
acidocaldarius and S. solfataricus proteins. Baumann et al. 
(1994) have recently shown that an increase in Sso7d 
methylation occurs upon heat shock and indicate that 
methylation may be directly related to protein stability. 
However, methylation may be an incidental response to an 
increase in methylase activity directed at other processes. 
Methylation may also increase the reversibility of the 
unfolding process rather than changing the stability. A direct 
calorimetric measurement of the unfolding and stability of 
these proteins has not been reported. 

The Sac7 proteins would appear to be ideal models for 
studies of protein folding and stability given their small size, 
the absence of cysteine, and expected high thermostability. 
Biophysical analyses of these proteins is hampered, however, 
- v by the inability to selectively isolate a homogenous isoform 
in large quantities. The differential methylation of individual 
7 kDa proteins could further complicate quantitative studies 
of structure and stability as well as DNA binding. Therefore, 
we have cloned and expressed the gene encoding the Sac7d 
species in E. coli to facilitate elucidation of the solution 
structure of the protein by NMR with high resolution, probing 
of the thermostability and DNA-binding properties of the 
protein by site-directed mutagenesis, and determination of 
the role of methylation. The availability of recombinant 
protein allows for a direct comparison of the stability of the - 
methylated and unrnethylated proteins. In the process of 
cloning the sac7d gene, the gene for Sac7e has also been 
cloned and sequenced; and we have delineated the transcrip- 
tion initiation and termination regions of the sac7d and sacle 
genes along with the promoter elements. 

An initial structure of the native Sso7d protein has been 
recently published by Baumann et al. (1994), and a high- 
resolution structure of the homogeneous, recombinant Sac7d 
protein has been completed (Edmondson, Qiu, and Shriver, 
manuscript submitted). There are significant differences 
between these structures, and it remains to be detennined if 
these can be attributed to sequence differences, lysine 
methylation, or quality of data due to heterogeneity in the 
native preparation. The spectroscopic, DNA binding,- and 
calorimetric comparisons of the native and recombinant Sac7 
proteins reported here indicate little difference in structure, 
but significant difference in thermostability. 
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MATERi.^S AND METHODS 

Strains of Microorganisms. E. coli strain DH5aFlQ [F 
/ac/<»ZAM15/A {lacZYA-argF) recA\ hsdR\7(xC m* 4 )] was 
purchased from Gibco BRL. £. coli strains HMS174 (p 
recA r~ti2 m\i2 BL21 (F~ ompT r~ B m~ B ), and their 
derivatives were generous gifts from F. William Studier 
(Studier et al., 1990). E. coli strain CJ236 (duC ung~) was 
obtained from Jack Parker (Southern Illinois University, 
Carbondale, DL). S. solfataricus P2 and S. acidocaldarius 
DG6 were gifts from Dennis Grogan (Gfogan, 1989, 1991). 
S. acidocaldarius (DSM 639) and S. solfataricus PI (DSM 
5354) were purchased from Deutsche Sammlung fiir Mik- 
roorganismen (DSM). ^' 

The Sulfolobus strain used here was received from W. 
Zillig (originally called 5. solfataricus PI). We have isolatec 
a single colony of our organism on solid medium (Grogan 
1989) and have compared the //mdlH, £coRl, and Sal. 
restriction fragment patterns of its genomic DNA with twi 
strains of 5. acidocaldarius (DG6 and DSM639) and twt 
strains of S. solfataricus (DSM5354 and P2) according u 
Grogan (1989). In each case the restriction pattern of ou 
organism is identical to the S. acidocaldarius strains and i 
distinctly different from the S. solfataricus strains. This ha 
been further substantiated by Southern analysis of genomi- 
DNA using Sac7 protein gene specific oligonucleotides (so 
Results). We have designated our laboratory strain as 5 
acidocaldarius RGXM. There has been confusion in th 
literature regarding the identity of the strains of rw- 
Sulfolobus species used in various laboratories at differet 
times. Zillig (1993) has recently addressed this issue an 
tried to clarify the confusion. 

Growth of Microorganisms. E. coli strains were grow 
in Luria Bertani media (1% bactbtryptone/1% NaCl/G\5< 
yeast extract) by standard methods (Sambrook et al., 1989 
Small scale cultures of Sulfolobus (10-200 mL) were grou 
in Brock's medium (Brock et al., 1972) at 75 °C, suppl< 
mented with 0.2% sucrose. Large scale Sulfolobus cultun 
were grown either in 10 L polypropylene carboy at 78 to I 
°C or in a 16 L VirTis glass fermentor at 70-72 °C wi 
vigorous aeration using DeRosa's medium (DeRosa 
Gambacorta, 1975) supplemented with 0.1% glucose ai 
0.1% glutamic acid. ' \ 

Enzymes and Chemicals. Restriction enzymes, alkali 
phosphatase, T4 DNA ligase, T4 DNA polymerase and 1 
- polynucleotide kinase were purchased from New Englai 
Biolabs, Brisco Ltd., BRL, or United States Biochemical C 
[ 32 P]H3P0 4 and 5'-[a- 35 S]adenosine thiotriphosphate t 
ethylammonium salt were purchased from ICN Biochemic 
Inc. and Amersham Co., respectively.- Sequenase versi 
2.0 DNA sequencing kit was obtained from United Sta: 
Biochemical Co. Specific deoxyoligonucleotides were pi 
chased from Research Genetics. . The list of the olif 
nucleotides used in this work is presented in Table 1. Dil 
bacterial media were purchased from Fisher Scientil 
CM52 was obtained from Whatman and Sephacryl S-K 
HR from Sigma Chemical Co. All other chemicals w 
reagent grade and obtained primarily from Fisher Scienti: 
J. T- Baker Co., and Sigma Chemical Co. Laboratory w? 
was routinely purified to 1 8.3 MQ resistance; with a recycl 
Bamstead Nanopure system. ' 

Genomic DNA Isolation. Cells from fo-20 ml cultu 
were pelleted and resuspended in 0.2— 0.3. mL of 10 r 
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Tris-HCl, pH 8.0/1 mM EDTA/1% SDS. This solution was 
rxtracted once each with equal volumes of phenol, phenol/ 
crJoroform/isoamylalcohol (25:24:1), and chloroform/isoamyl 
Jcohol (24:1). Sodium acetate (3 M, pH 5.2) was added to 
die final aqueous phase to a concentration of 0.3 M, followed 
by DNA precipitation with three volumes of ice-cold ethanol. 
Tne DNA was spooled onto a thin glass rod, washed in 70% 
rihanol, and air dried. The DNA was dissolved in 10 mM 
.Tris-HCl, pH 8.0/1 mM EDTA. 

Cloning, Hybridization, and Sequencing, The preparation 
of a 7*5/1 genomic library of S. acidocaldarius RGJM in E. 
coli strain DHSaFlQ and screening of the library by colony 
hybridization was according to published procedures (Berger 
k Kimrnel, 1987; Sambrook et al., 1989). Southern and dot 
blot hybridizations were carried out using nitrocellulose 
membranes according to the manufacturer's protocols 
{Schleicher & Schuell) which are based on the method of 
Southern (1975). The preparation of [y - 32 P]ATP and 5' 32 P- 
end-Iabeling of oligonucleotides was by standard methods 
(Johnson & Walseth, 1979; Gupta, 1984; Sambrook el al.i 
1989). DNA was sequenced by the dideoxy chain termina- 
tion method (Sanger et al., 1977) using a Sequenase version 
10 kit The final sequences were determined from both 
strands. The standard universal primers for Stratagene's 
pBluescript vectors (Short et al., 1988) and specifically 
synthesized oligonucleotides were used in sequencing reac- 
tions. DNA sequences were analyzed using the computer 
program DNA Inspector Be (Textco Co.). 

Primer Extension. Total RNA from 5. acidocaldarius 
RGJM was isolated by previously published procedures 
(Emory & Belasco, 1990). The primer extension assay was 
conducted as described in the Promega "Protocol and 
Applications" manual. 

Oligonucleotide-Directed Mutagenesis. Procedures for the 
oligonucleotide directed mutagenesis were those outlined in 
thefiio-Rad Muta-Gene manual and are based on Kunkel's 
method (Kunkel et al., 1987) using E. coli dut~ung~ strains. 
We were unable to propagate the substrate for oligonucleo- 
tide directed mutagenesis, pBluescript KS+fsac7d (see 
Results for the description and nomenclature of the plasmids), 
b£. coli strain CJ236 (dur ung-). Therefore, we used 
DHSaFlQ as the host cell for the production of single- 
stranded template and as the recipient for transformation with 
mutagenized plasmid and modified the procedure for the 
selection of mutant plasmid. Colonies arising from trans- 
formation with the plasrhids from the mutagenesis reaction 
to create the Ndel site were pooled and grown as a mixed 
culture. Plasmids isolated from these cells were digested 
*ith Ndel and separated on a 0.8% agarose gel. Linear 
plasmids were isolated from the gel, recircularized, and again 
wed to transform DHSaFlQ. Plasmids were then extracted 
from individual colonies and screened for the presence of 
to Ndel restriction site by digestion with the enzyme. Final 
confirmation of the desired mutation in the plasmids was 
obtained by sequencing. -• . * • 

Gene Expression. For gene expression, pET-3bfsac7d was 
(nmsformed into E. coli strain BL21 (DE3) pLysS (Studier 
* aL. 1990). For protein isolation, a 10 mL culture' of this 
transform ant was grown overnight in LB broth containing 
topicillin (200 fig/mL) and chloramphenicol (27 //g/mL). 
from this, 0.6-1 mL was used to inoculate 50 mL of fresh 
°*dium ; At an A^oo of 0.3-0.6, 25 mL of the culture was 
fluted into 1 L of new medium. The culture was induced 
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upon reaching an A^oo of 0.8-0.95 by adding 1PTG to a final 
concentration of 0.4 mM. A small aliquot of each culture 
was taken prior to induction to assay for expression and 
plasmid stability as described by Studier et al. (1990). 
Cultures were harvested at 1 h postinduction and stored at 
-70 °C " ■ . 

Protein Isolation and Purification. E. coli cells containing 
recombinant protein were thawed slowly and resuspended 
in 100 mL of 10 mM Tris-HCl, pH 7.5/0.5 mM phenyl- 
methanesulfonyl fluoride, and the cells were lysed by 
repeated freezing and thawing along with brief sonication 
on ice. To isolate native protein, Sulfolobus cells were 
suspended in 0.05 M KH 2 P0 4 buffer (pH 6.8) arid lysed by 
sonication on ice. DNase I (20 mg/100 mL) was added to 
lysed cells, and the suspension was incubated at 37 °C for 5 
min followed by centrifugation at 280000g for 60 min. The 
supernatant was cooled on ice and dialyzed in SpectraPor 
CE 1000 MWCO tubing against 0.2 M H 2 S0 4 overnight at 
4 °C. The resulting precipitate was removed by centrifuga- 
tion at 1 80000g for 30 min, and the supernatant was dialyzed 
four times against 20 mM Tris-HCl, pH 7.4/1 mM EDTA. 
A small amount of precipitate was removed by centrifugation, 
and the supernatant was applied to a CM-52 ion exchange 
column equilibrated with 20 mM Tris-HCl (pH 7.4). The 
protein was eluted with a linear NaCl gradient (0.0-0.3 M) 
with both the native and recombinant Sac7 proteins giving 
a primary peak at approximately 0.2 M NaCl. Further 
purification was accomplished by gel exclusion chromatog- 
raphy on Sephacryl S- 100-HR in 0.02 M Tris-HCl (pH 7.4). 

The identity and purity of the 7 kDa proteins were 
monitored by nonreducing SDS gel electrophoresis (Schagger 
& von Jagow, 1987). The recombinant protein showed a 
single band that comigrated with the mixture of Sac7 native 
proteins isolated from S. acidocaldarius (Figure 2) and was 
absent in preparations from control E. coli cells lacking the " 
recombinant plasmid (data not shown). The Sso7 proteins 
run slightly ahead of .Sac7 proteins, consistent with a / 
molecular weight of 7020 (calculated from the sequence^ 
The Schagger— von Jagow gel used here did not resolve the 
individual Sac7 and Sso7 native species. The identity of 
the recombinant Sac7d protein was confirmed by comparison 
of the double-quantum filtered COSY spectra of native Sac7 
and recombinant Sac7d proteins (see below) and by the 
consistency of the sequence specific ! H NMR assignments 
with.the expected sequence (Edmondson, Qiu, and Shriver, 
submitted). . . - 

In earlier studies the recombinant protein was isolated by _ 
a different procedure (McAfee, 1993). E. coli cells Were 
lysed and DNase treated as above but without sonication. 
The pH of the supernatant was adjusted to 1.5 with 5 M 
H 2 S0 4 . After 45 min on ice and centrifugation, the 
supernatant was neutralized with 10 N NaOH. The mixture 
was incubated in a water bath at 70 °C for 2 h, followed by 
centrifugation. The supernatant was dialyzed three times 
with 1 mM NaH 2 P0 4 buffer (pH 7.0) followed by CM-52 
chromatography as above. 

Molecular Weight Determination. Approximate molecular^ 
weights of the native and recombinant Sac7 proteins were 
determined by gel exclusion chromatography on Sephacryl 
S- 100-HR. Cytochrome c, myoglobin, carbonic anhydrase, 
and bovine serum albumin were used as molecular weight 
standards, and blue dextran and DNP-alanine were used to 
measure the column void and total volumes, respectively^ 
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The molecular weights were determined as described by 
Mayes (1984). . : » .. .... 

Phosphorylation and Glycosylation Assays. Phosphate 
analysis was performed by the method of Fiske and Sub- 
barow (Fiske & Subbarow, 1925; Leloir & Cardini, 1957). 
Small aliquots of Sac7 (0.95 mL of a 0.5 mg/mL solution 
in 0.02 M Tris-HCl, pH 7.0) were incubated at 37 °C for 1 
h with 0.05 mL of bovine intestinal alkaline phosphatase 
(2J5 mg/mL in 0.01 M Tris-HCl, pH 9.8). The protein was 
precipitated with 0.10 mL of concentrated perchloric acid, 
incubated on ice for 10 min, and centrifuged for 5 min at 
13 000 rpm. To 0.90 mL of supernatant was added 2.0 mL 
of distilled water, 1.0 mL of 5 N H 2 S0 4 , 1.0 mL of 2.5% 
ammonium molybdate, and 0.10 mL of reducing agent, 
[prepared fresh by dissolving 0.25 g of reducing mixture^ 
(sodium bisulfite, sodium sulfite, and 1 -amino-2-naphthol- 
4-sulfonic acid in a 46:46:8 ratio) in 10 mL of water]. The 
solutions were allowed to stand for 20 min, and the 
absorbance was measured at 660 nm. A standard curve was 
prepared using known amounts of a 0.01 M KH2PO4 solution. 
0-Phosphoserine, treated with alkaline phosphatase as 
described for Sac7 gave quantitative recovery of phosphate. 

The phenol— sulfuric acid reaction was used to assay 
carbohydrate content of Sac7 protein (Debois et al., 1956; 
Hirs, 1967). To 1.0 mL aliquots of Sac7 protein solution 
(0.3 mg/mL) was added 0.25 mL of 80% phenol and 2.5 
mL of concentrated sulfuric acid. After mixing, the solutions 
were left at room temperature for 10 min and then placed in 
a 25 °C water bath for 20 min. The absorbance was 
measured at 489 nm. Known amounts of a-D-glucose were 
used to construct a standard curve. ■. : - 

Protein Extinction Coefficient. Ultraviolet and visible 
spectra were recorded on a Cary 210 spectrophotometer at 
25 °C. The wavelength accuracy was checked using benzene 
vapor and found to be accurate to within ±0.3 nm, and the 
absorbsance accuracy was checked using potassium chromate 
in 0.05 M KOH (Gordon & Ford, 1972) and found to be 
accurate to within 1%. . 

The extinction coefficients of both the native Sac7 and 
recombinant Sac7d proteins were determined by measuring 
the amino acid concentration using the ninhydrin reaction 
(Moore & Stem, 1954) for a sample of known absorbance. 
A standard curve was prepared using amino acid standard 
H (Pierce Biochemicals) and converted into leucine molar 
equivalents. VThe concentration of amino acid standards was 
checked using tyrosine with an extinction coefficient of 6274/ 
= 1340 in 0.1 M HC1. The molar concentration of amino 
acid residues in the samples was calculated by dividing 
leucine equivalents by the average color yield based on the 
amino acid composition (Moore & Stein, 1954). The average 
color yields for Sac7d, lysozyme, and RNase A were 1.0, 
1.05, and 1.06, respectively. The extinction coefficients of 
lysozyme and RNase A standards were checked by this 
procedure and found to be within 1% of published values. 
The procedure gave an extinction coefficient of 1.03 ± 0.05 
mLAmg-cm) for both native and recombinant proteins. 
■.. The extinction coefficients were also determined by the 
method of van Iersel et aL (1985) immediately following 
chromatography of the proteins on Sephadex G-50 in 0.01 
M NaH 2 P0 4 buffer (pH 6.5); A flat (±0.0005 absorbance 
units) spectrophotometer baseline was programmed using the 
same buffer which had been used to equilibrate the column. 
Protein spectra were collected on samples directly from the 



gel exclusion column, generally using only those sai 
with an absorbance less than 2.0 at 205 nm to minimi: 
effects of stray light. The reproducibility of the 
ratio using different aliquots collected through the p: 
peak as it eluted from the column was found to be c 
order of 99%. The linear relationship between the extii 
coefficient at 280 nm and the ratio of the absorbance ; 
and 205 nm was confirmed in our hands using b 
a-chymotrypsin (Worthington), hen egg white lysc 
(Sigma), bovine pancreatic ribonuclease A (Sigma), ; 
(Sigma), /Mactoglobulin (Sigma), and bovine serum alt 
(Sigma). A linear fit of the standards yielded a sta 
curve such that ^ . 

A 

- '* ,0.1% 280 r\(\A 

6 280 - 35.76- 0.04 

205 . . .. 

with a correlation coefficient of 0.999 and a sta 
deviation for the slope of 0.62 and 0.03 for the y inte 
The extinction coefficients for the native and recoml 
protein were found to be identical with this technique a 
mL/(mg-cm) with a standard deviation of 0.008 mL/(m 

The extinction coefficients were also calculated to b 
mL/(mg-cm) in 6 M guanidine hydrochloride, based 1 
amino acid content of the protein using the procedi 
Edelhoch (Edelhoch, 1967; Gill & von Hippel, 
assuming f Ty r = 1280 M" 1 cm -1 , e Trp = 5690 M" 1 cr 
6 M guanidine hydrochloride. An increase in absor 
of 3.5% was noted upon denaturation of the protein \ 
M GdnHCl, so the calculated extinction coefficient < 
folded protein was corrected to 1.05 mL/(mg*cm). 
estimated error was taken to be ±0.04 with a maxima 
of ±0.15 (Gill & von Hippel, 1989). 

Circular Dichroism. Circular dichroism spectra of pi 
native Sac7 and recombinant Sac7d proteins were me; 
at room temperature in a 0.01 cm path length cylir 
cell oh an AVTV 62DS spectropolarimeter. CD daU 
collected at 1 nm intervals using averaging times of 1 
s/nm, depending on the signal-to-noise ratio. Relativel 
signal-to-noise ratios made signal averaging of multiple 
unnecessary. The spectral bandwidth was 1 'JS nm. Ba< 
were measured using water and subtracted from the s 
CD. Sample concentrations ranged from 0.2 to 0.7 m 
Protein concentrations were determined from UV abso 
spectra measured in 1 cm cuvettes. The molar C 
peptide bond was determined using standard proa 
(Johnson, 1984) along with the UV extinction coef. 
determined above. CD spectra were smoothed as des 
by Savitsky and Golay (1964). The CD' was calibn 
290 JS nm with J-camphor-10-sulfonic acid using At 
2.36, and the ratio tewid&^m was —2.10 (Chen & 
1977). 

. The fractions of protein secondary structures were 
mined by fitting the CD spectra from 260 to 184 ni 
nm intervals using the variable selection method of Jc 
(Manavalan & Johnson, 1987). The results reported ; 
averages plus or minus one standard deviation of all pt 
combinations of 22 reference proteins taken" 19 at. 
that (1) have secondary structure components greatc 
—0.05, (2) have sums of secondary structures betwe 
and 1.1, and (3) have an rms error between measun 
calculated CD spectra less than 0.21 At units. The n 
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of fits meeting this selection criteria were greater than 250 
for native and recombinant protein. 

Nuclear Magnetic Resonance. NMR spectra were col- 
lected on a Varian 500 MHz NMR spectrometer with the 
magnet installed on a TMC Micro-g triangular antivibration 
table. All data were collected at 35 °C in 90% H 2 O/10% 
D : 0, pH 4.1, with a protein concentration of approximately .... 
10 mM. The pH was adjusted with DC1 and NaOD using a 
Radiometer glass electrode and was not corrected for the 
deuterium isotope effect (Bundi & Wuthrich, 1979). The 
chemical shifts are referenced to the water resonance at 4.73 
ppm at 35 °C [measured relative to sodium 4,4-dimethyl- 
4-silapentane sulfonate (DSS) in a separate experiment 
without protein]. - " - : : . 

Phase- sensitive double-quantum filtered COSY (DQF- 
COSY) spectra were collected using standard procedures 
(Ranee et aL, 1983). Typically, 1024 data points were 
collected in the tj domain with 512 increments in the t\ 
domain, each the sum of 32 scans with a 3 s relaxation delay. 
The spectral widths in both dimensions was 6000 Hz. The 
water peak was diminished in all experiments by presatu- 
raiion during the relaxation delay. Both carrier and decoupler 
frequencies were set equal to the water resonance frequency 
in all experiments (Zuiderweg et aL, 1986). 

The NMR data were transferred to a Silicon Graphics 
workstation for Fourier transformation and further data 
manipulation using FELIX 2.1 (BioSym). The data were 
zero-filled to 2048 data points in both dimensions and treated 
with a Lorentzian to Gaussian apodization function prior to 
Fourier transformation. 

Differential Scanning Calorimetry. Differential scanning 
calorimetry was performed with a Mi croc al MC2 calorimeter. 
Temperature calibration was monitored using sealed samples 
supplied by Microcal. Heat flow accuracy was periodically 
monitored by applying pulses of known magnitude using the 
internal heater. In addition, ribonuclease A (Sigma, R5250) 
*as used as a benchmark test protein and shown to unfold 
n pH 2.2 [0.1 M KC1, 0.02 M glycine, e 2 so = 0.69 mL/ 
img-cm), MW 13 700] with a T m of 36.0 °C, a AH^ of 74.1 
Ical/mol, and a A// Vh of 74.8 kcal/mol (Af/aj/AZ/vb ratio of 
100 ± 0.01), in good agreement with the published values 
ef Tiktopulo and Privalov (1974). ^ . ' *- * 

Protein solutions were exhaustively dialyzed against the 
indicated buffer overnight The sample cell was loaded with 
1229 mL of protein solution, and the reference cell was filled 
*ith the last dialysis buffer. Approximately 30 psi of 
nitrogen was applied to the cells during each scan to 
Minimize degassing during heating. Samples were not 
degassed, but, instead, the sample was heated repetitively 
tee times in the DSC instrument by scanning to 35 °C (i.e., 
below any denaturation endotherm), followed by rapid 
owling. This procedure resulted in the flattest and most 
lc producible instrumental baselines. ... 

All DSC experiments were under computer control using 
* IBM PC computer interfaced to the Microcal MC2 
^arumem. A scan rate of 1 deg/min was used in all 
^periments. The computer interface and data collection 
omvare were supplied by Microcal. . Multiple, repetitive 
**ns were performed on the same sample to check for 
^trsibility, with identical cooling and equilibration times 
^een scans. . • 
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The DSC raw data, in the form of heat flow (mcal/min) 
as a function of temperature, was transferred to a Macintosh 
Quadra computer for analysis. The raw data were converted 
to excess heat capacity (kcal/deg-mol) by dividing each data 
point by the scan rate and the concentration of protein in 
the sample ceD. All baselines were corrected by subtraction 
of DSC scans of the buffer against which the protein had 
been dialyzed. The heat capacity data was fit by using in- 
house nonlinear least-squares fitting routines to obtain the 
midpoint temperature of the transition and both the calori- 
metric and van't Hoff enthalpies. The basis of the programs 
has been described elsewhere (Shriver & Kamath, 1990). 

Fluorescence. Fluorescence titration measurerhents were 
performed on an SLM 8000C spectrofluorimeter with 4 nm 
excitation and 8 nm emission slit widths. Binding titrations 
were performed with excitation at 295 nm and emission 
monitored at 350 nm. Reverse titrations were performed by 
adding aliquots of concentrated nucleotide solutions to a 
known concentration of protein in a 4 mL fluorescence quartz 
cell with stirring using a magnetic "flea" within the cell. 
Nucleic acid concentrations were determined spectropho- 
tometrically using an extinction coefficient of 8400 L/(crrrmol) 
for poly[dGdC]-poly[dGdC] (Wells, 1970) and 6600 
L/(cnrmol) for poly[dAdT]*poly[dAdT] (Inman, 1962). All 
experiments were performed at 25 °C. The fluorescence 
intensity was constant at high DNA concentrations, and thus 
no correction was made for the inner filter effect Appar- 
ently, any decrease in fluorescence due to the inner filter 
effect was balanced by other effects, such as scattering by 
the DNA-protein complexes. Photobleaching was not ob- 
served during the titrations. Binding parameters were 
obtained by using a simple, noncooperative McGhee— von 
Hippel model (McGhee & von Hippel, 1974). 

DNA Stabilization. Thermal denaturation studies of DNA " 
and DNA— protein complexes were performed on a Cary 210 
spectrophotometer equipped with water-jacketed cuvette 
holders and a circulating water bath calibrated to within iO.3- 
°C. Melting curves are scaled to an Ajgz of 1 .6 at 20 °C for 

the DNA component of DNA-protein mixtures. 

. . / 

Sequence Analysis. BLAST (Altshul et al., 1990) search- 
ing and alignment were performed using the NCBI server 
(blast@ncbi.nlm.nih.gov) against the *'nr" (nonredundant) 
sequence database (including Brookhaven Protein Data Bank, 
January 1994 release; SWISS-PROT Release 29.0, June 
1994; PIR Release 41 .0, June 30, 1994; CDS Translations 
from GenBank Release 83.0, June 15, 1994, Kabat Sequences ~ 
of Proteins of Immunological Interest Release 5.0, August 
1992; TFD Transcription Factor Database Release-7.6, June 
1993). BLITZ and FASTA searches of the latest SWISS- 
PROT database were performed using the EMBL servers 
(blitz@embl-heidelberg.de and fasta@embl-heidelberg.de). 
Database retrieval was performed using the GDB/Accessor 
(Johns Hopkins University) available from ftp.gdb.org. 
MacPattern (Fuchs, 1991) (fuchs@embl-heidelberg.de) was 
utilized for BLOCKS (Henikoff & Henikoff, 1991) and / 
PROSITE (Bairoch, 1992) analysis on a Quadra 700 
(BLOCKS database Version 7.01 was utilized with 2679 
entries and PROSITE database version 12.0, June 1994, was 
used with 1021 entries, both obtained from the /NCBI ftp 
site ncbi.nlm.nih.gov.) The Mac Vector software package 
".(IBI) was utilized for protein secondary structure analysis. 
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Table 1: Lisi of Oligonucleotides 



oligo- 








nucleotide 0 


sequence 6 ' : 


position' 




NArYTPYTTVTrYTrNrP 

i ^ /AV, I 1LI 1 1 I JLI 1 k-JNV^k^ 


230- 


-247 


Q 




218- 


-237 


c 


GGGCiTArrRTTRTfRTPRTANnTR A A rf 


296- 


-317 


D 


TCTTAACAAATTATTTTATTT :•■ 


398- 


-418 


E 


GCCCTTTATACCTTCCCCTTA : 


398- 


-418 


F 


CCTGTCTTACCATTGTCGTC . 


305- 


-324 




CCTTCACCATATGAGGTCAAGTTATe 


187- 


-212 


H 


GACTTAACTTAATACCG 


143- 


159 



c Oligonucleotides A, B, and C were derived from amino acids 9-14, 
5-11, and 3 1 ~38, respectively, of the Sac7 proteins (Figure 1). These 
amino acid sequences are identical in the four Sac7 proteins. * N = A, 
G, C, or T; Y = C or T; R = A or G. f Nucleotide positions correspond 
to those in Figure 3. Sequences of oligonucleotides A, C, D, E, F, 
and G are complementary to the sequences shown in Figure 3.^ 
Oligonucleotides D and E correspond to the same positions (Figure 3) 
for sacld and socle, respectively. d Oligonucleotides B and C have 
six and four additional nucleotides, respectively, at the 5' tenmini which 
are not derived from the amino acid sequence of the protein. * Sequence 
of the primer used for oligonucleotide directed mutagenesis. The 
underlined G replaces a T in the sacld gene sequence creating an Nde\ 
restriction site. „ 

RESULTS 

Gene Cloning and Sequence. Pstl digested genomic DNA 
of S, acidocaldarius RG JM was shotgun cloned in the vector 
pUC19 and transformed into E. coli, DHSaFlQ. Ap- 
proximately 10 000 transformants were screened by colony 
hybridization to a mixed oligonucleotide probe (oligo- 
nucleotide A, Table 1) derived from residues 9-14 of the 
published amino acid sequence of the S. acidocaldarius 7 
kDa proteins (Kimura et ah, 1984; Choli et al., 1988a). [The 
published amino acid sequences for Sac7a, b, d, and e are 
identical over this range (Figure 1) as well as over the ranges 
for oligonucleotides B and C] Tentative positive clones 
were restreaked onto selective media and screened a second 
time with the same probe. Plasmids isolated from a number 
of these positive clones were then independently hybridized 
t o three different mixed probes (olig onucleotidesA^B, and 
QJab le 1) by dot blot hybridization. Two clones were 
isolated which hybridized to all three probes. Plasmids 
isolated from these cells were partially sequenced using 
oligonucleotide B as a primer. One of the genes cor- 
responded with the published protein sequence for the 
carboxy-terminal half of the Sac7d protein of S. acidocal- 
darius (Kimura et al. f 1984; Choli et al., 1988a) with the 
exception of one . additional lysine at the carboxy terminus, 
and the other coiTesnonded to the Sac7e sequence. The genes 
which matched the Sac7d and 7e proteins have been 
designated sacld and sacle, respectively. 

Agarose gel analysis of the plasmids carrying the sacld 
(p\JC\9/sacld) and sacle (pVCl9/sacle) genes indicated 
that the cloned Pstl fragments were greater than 15 kb in 
size. Southern blot hybridizations of oligonucleotide C to 
the restriction digests of p\JC\9/sacld indicated that sacld 
gene was present on a slightly less than 800 bp EcoRl 
fragment. Preliminary sequencing of p\JCl9/sacld using 
oligonucleotide B as a primer indicated the presence of an 
EcoRl site 61 bases downstream of the termination codon 
of the protein. Since the published sequence of Sac7d protein 
consists of 64 amino acids (Kimura et al., 1984; Choli et 
al., 1988a), the second EcoRl site was expected, to be 
upstream of the start codon.: Thus, the £a?RJ fragment 



hybn.. . Jng to probe C was expected to contain the 
coding region of the gene. This EcoRl fragmen 
subcloned in the vector pBluescript KS+ to produce r 
script KS+/sacld f and the sequence of sacld gen< 
determined (Figure 3). The sequence of the sacle 
(Figure 3) was obtained directly from the p\JC\9/sacle 
primers complementary to the coding region of the £ 

The GenBank accession numbers for the sacld and 
gene sequences reported here are M87569 and LC 
respectively. . . . 

Sequence Analysis and Gene Copy Number. The s 
transcription for both sacld and sacle genes was deter 
using primer extension analysis (Figure 4). Specific pi 
(oligonucleotides D and E, Table 1) that were complem 
to residues 398-418 (Figure 3) of the two genes were 
A single start site was observed for each of the two 
which occurs on a guanosine residue eight nucle 
upstream from the initiation codon. These guai 
residues are present within perfect archaeal "B box' 

sensus sequences (consensus ^TG~ (Zillig et al., 19* 

sequence resembling the archaeal "A-box" motif (con. 1 
a 

TTTAyA) is seen 24 and 23 nucleotides upstream fro 
transcription start site for the sacld and sacle \ 
respectively (Figure 3). The 4 \A-box" of sacld has 
base match with the consensus sequence, while that f 
sacle has only four matches. 

Oligonucleotide F (Table 1) was used to probe gei 
blots of three S. acidocaldarius (RGJM, DG6, and DS^ 
and two S. solfataricus (DSM5354 and P2) strains (I 
5 A). Oligonucleotide F is complementary to a region c 
for residues 34—40 (Figure 1) which are identical for; 
S. acidocaldarius 1 kDa proteins (DDNGKTG) and si 
cantly different from that of S. solfataricus (DEGGG 
two substitutions and an insertion). Two HindUl restr 
fragments (^-3.0 and ~4.6 kb) were recognized by the 
in all three S: acidocaldarius strains, while no hybridi: 
to the S. solfataricus strains was observed. This obser 
reinforces the assignment of the RGJM strain (our labo 
strain) as an S. acidocaldarius strain. The results in 
that the putative, genes encoding all of the Sac7 proteL 
present on the two Hin<HH restriction fragments of ~3. 
~-4.6 kb in size. Genomic blots' of EcoRl, HindUl, an 
digested S. acidocaldarius RGJM DNA were also p 
with the common oligonucleotide F (Hgure 5B), and ii 
case hybridization to two bands was observed. One 
in each hybridized to oligonucleotide H, specific f 
untranscribed region upstream of the sacld gene (Figun 
Results of the hybridizations of various restriction d 
of the original pUCfsacld and pUC/sacle clones t 
propriate oligonucleotides (data not shown) corroborat- 
results in Figure 5 and also indicated that the original ( 
had a single copy of a sad gene. The 3.0 and 4.6 kb H 
fragments can be correlated with the sacld and sacle j 
respectively. The data indicate that there are only tw< 
genes in 5. acidocaldarius genome, each being preser. 
single copy. This reinforces the conclusion that Sac*/ 
Sac7b are proteolytically truncated versions of the : 
protein. ../.f.-.j ^ . ■ „ : 

l Protein Sequence Analysis. -The sacld open reading 
can encode a 66 amino, acid protein, with a calci 
molecular weight of. 7608, and the sacli encodes a 65 : 
acid protein with a calculated molecular weight of 
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Sac7a 
/ «T.Sac7b 
Sac7d 
Sac7e 
— ~ Sso7d 



JJL 



Val-Lys-Val-Lys*-Phe-Lys*-TV^- L y s - G1 y" Glu " Glu " L y s " Glu_Val ~ Asp " 
Val-Lys-Val-Lys*-Phe-Lys*-Tyr-Lys-Gly-Glu-Glu-Lys-Glu-Val-Asp- 

Val- Lys-Val -Lys* -Phe-Lys*-Tyr-Lys-Gly-Glu-Glu-Lys-:Glu-Val-Asp- 

ATa lLys- Val tArg | -Phe-Lys*-Tyr-Lys-Gly-Glu-Glu-Lys-Glu-Val-Asp- 

Ala-llirlval-Lys*-Phe-Lys*-Tyr-Lys-Gly-Glu-Glu-Lys-Glu-Val-Asp- 



Sac7a 
Sac7b 
Sac7d 
Sac7e" 
Sso7d 



16 



20 



25 



30 



Thr-Ser-Lys-Ile-Lys-Lys-Val-Trp-Arg-Val-Gly-Lys-Met-Val-Ser- 
Thr-Ser-Lys-Ile-Lys-Lys-Val-Trp-Arg-Val-Gly-Lys-Met-Val-Ser- 
Thr-Ser-Lys-Ile-Lys-Lys-Val-Trp-Arg-Val-Gly-Lys-Met-Val-Ser- 
Thr- Ser-Lys-Ile-Lys-Lys-Val-Trp-Arg-Val-Gly-Lys-Met^al^Ser- 
Ile ^Ser-Lys-Ile-Lys-Lys-Val-Trp-Arg-Val-Gly-Lys-Mett lle fSer- 



Sac7a 
Sac7b 
Sac7d 
Sac7e 
Sso7d 



31 



35 



Phe-Thr-Tyr-Asp-Asp-Asn-Gly- 
Phe-Thr-Tyr-Asp-Asp-Asn-Gly 
Phe-Thr-Tyr-Asp-Asp-Asn-Gly 
Ph^-Thr-Tyr-Asp-Asp-Asn-Glv 
Phe-Thr-Tyr-Aspr GIu-Gly ^lyj Gly 



40 



45 



Lys-Thr-Gly-Arg-Gly-Ala-Val-Ser- 
Lys -Thr-Gly-Arg-Gly- Ala-Val -Ser- 
Lys -Thr -Gly-Arg-Gly -Ala-Val -Ser- 
Ly s-Thr-Gly- Arg-Gly- Ala-Val -Ser- 
Lys-Thr-Gly-Arg-Gly-Ala-Val -Ser- 



Sac7a 
Sac7b 
Sac7d 
Sac7e 
Sso7d 



46 



50 



55 



60 



Glu-Lvs-Asp-Ala-Pro-Lys-Glu-Leu-Leu-Asp-Met-Leu-Ala -Arg-Ala- 
Glu-Lys-Asp-Ala-Pro-Lys-Glu-Leu-Leu-Asp-Met-Leu-Ala | 
nin-Lys-Asp-Ala-Pro-Lvs-Glu-Leu -Leu- Asp-Met-Leu-Ala-Arg-Ala- 
niii-Lys-Asp-Ala-Pro-Lvs-Glu-Leu iMet-tAsp- Met-Leu -Ala-Arg-Ala- 
Glu-Lys-Asp-Ala-Pro-Lys-Glu-Leu-Leu-|GlntMet-Leuf = 1 " 
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FIGURE 1: Amino acid sequences of the Sac7a t b, d, and e proteins [after Kimura et al. (1984) and Choli et al. (1988b)] and the Sso7d 
protein (after Choli et al. (1988a)]. [Note that the sequence reported by Kimura et al. (1984) was claimed to be for Sso7d but was later 
shown to be for Sac7d (Choli et al., 1988a).] Numbering is according to the Sac7d sequence without the initiator methionine. Regions 
homologous to the Sac7d protein are outlined. Sac7a, b, and d differ only in length. Lysines which are monomethylated to some extent in 
the native protein are indicated with asterisks. The additional C-terminal lysine coded by the sac7d gene described here which was not _ 
indicated in the published protein sequence is enclosed in parentheses. . . . : . ^ 

! 2 3 Gly43 to Ala59. Only the Chou-Fasman algorithm predicts 

a small amount of /J-sheet (12%) extending from Lys22 to/ 
Lys29 and from Ser3 1 to Asp36. Reverse turns are predicted 
near Asp36 and Gly43. These predictions are not consistent 
with the solution structure of the Sac7d protein which has 
been determined by 2D NMR (Edmondson, Qhvand Shriver, 
manuscript submitted). 

. Recombinant Gene Expression. The sac7d gene (in 
pBluescript KS+/sac7d) was modified by converting the 
hexanucleotide sequence containing the initiation codon 
(AATATG) to an Ndel site (CATATG) by oligonucleotide - 
G (Table 1) directed mutagenesis to produce pBluescript 
KS+fsac7d(Nd). The Ndel-BamHl fragment of pBluescript 
KS+fsac7d(Nd) carrying the coding region of sac7d gene 
was then subcloned into the Ndel—BamVH site of pET-3b 
(Studier et al., 1990) to give pET-3b/sa<:7</,.and transformed 
into HMS174 (DE3), HMS174 (DE3) pLysS, BL21 (DE3), 
and BL21 (DE3) pLysS (Studier et al., 1990). The plasmid 
could be established in all of these strains except BL21 
(DE3). Furthermore, in transformed BL21 (DE3) pLysS,^. 
the growth of the organism is impaired and cultures lyse 
within 60—70 min after induction with IPTG. On the other 
/ hand, the growth of HMS174 strains were not significantly 
effected by the presence of the plasmid, and lysis^was not 
/ observed in cultures after 3 h postinduction. The absence 
of impaired growth in the presence of the plasmid in these 




Figure 2: Schagger and von Jagbw (1987) polyacrylamide 
nonreducing SDS gel of purified native Sac7 proteins (lane 1), 
recombinant Sac7d (lane 2), and nauve Sso7 Oane 3) proteins 
stained with Coomasie Brilliant Blue G-250 (Bio-Rad). The 
molecular weight of the Sso7 protein is 7019 based on the published 
protein sequence (Choli et al., 1988a), while that of the Sac7d is 
7608 based on the DNA sequence presented here. The band 
positions of myoglobin (MW 16 900) and insulin (MW 5780) are 
indicated for comparison. « / 

Oncluding initiator methionines). . Secondary structure analy- 
sis of the sequences of the Sac7d and Sac7e proteins was 
performed with both the Chou— Fasman (Chou & Fasman, 
1974, 1978) and the Robson -Gamier algorithms (Robson 
& Suzuki, 1976; Gamier et al., 1978). Both methods predict 
the occurrence of significant a-helix (52%) in both proteins 
extending from approximately Lys9 to Lys28 and from 
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GAATTCTTAT 



51 



101 



GTTCTATAGC GT AATT ATGAACAGTTGT ATAACTCCTTTAGAGAATAAAT 
CTTAGACGACAAACCTGTAAATAGTATAGTAAATAATGCTATAAATGAAT - 

TATATTTCAATATTACTAATTATTGTAjCTGGATTCCCCATAA^ at -/. 

ATGG TGGTACTCCTCAGATAAATTTC ACAAAAGTTAGGGCTA r l i i \*AAA " 

ACATTATATAGGAAAAATAATTTGAGGTAGTCTCATAAGTAT<^CTTAA 
TAAATTGTAATG TGAT ACTAATGATATTTGGATATT AATCT AATACTGGT 

(A -box) — ' ; '* '• 

TTAATACCGTAAGGTrmrTATC^AATATCGTAAGATA^ 
A T A TTJlftT^TAATATTAATT&ATGG^ 

: c^a£^£L^J,i :c ■Xl 1 ^'*> : 

M- V K V £ F K Y_^K> G^E E K E V D 
ATATGC^TGAACGTAA&CTITCAAGTATAAGGGTGAAG^ . 
ATATGGCAAA£GT£AG<nTTA*GTATA^ 
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TSKIKKVWRVGKMVSFT 
ACTTCAAAGATAAAGAAGGTXTGGAGAGTAGGCAAAATGGTff . 
ACTTCAAAGATAAAG AAGGTCTCGAG AGTTGGCAAAATGGTGTCCTTTAC . 

T S K I K K V W R V . G K M V S. F . T t 

Jr ■ - erf? 

YDDNGKTGRGAVS E^K D 
CTATG ACGAC AATGG TAAGACAGGT AGAGGAGCTGTAAGCGAGAAAGAIG 
CTATGACGACAATGGTAAGACAGGTAGAGGAGCTGTAAGCGAAAAAGACG 

YDDHG KTGR G A V £~- E K D 1 . 

^ — " 

A^» / $ J>-A J> L ^£LJB--A E E E 



X K 



451 



501 



CTCCAAAAGAA^'ATTAGACA^QTTAGCAAGAGQ^ ' . 

CTCCAAAAGAA^TAATGGACATGTTAGC^GAGCAGA^ X 
AP K ELHDMLA'R AE££K 8tOp 



stop • ; -- 

TAAAAT^ ^TT TYyTTr& ^r:&&^ ATr*TTr'ftTATAAA'i'ivri*i'iTT Ai'i'i'l\-T jG 
GGGGAAGGTATAAAGGGCTTTTTAAATGTCAAAAgi U'l'lU'^ 

TTTTAATTTATTAGAATTC • • ■ 

GCATTTCAACTTTAGAAGATCTTTTATAATAGCCTAAATT^ 



GGAGTTTTTCCGCTATTCTTAGGCTrcGATAAT 



Figure 3: Nucleotide sequences of the sacld and sac7e genes. 
The top and bottom sequences are the nucleotide sequence for the 
sac7d and sac7e genes, respectively (aligned using the coding region 
of each gene). Numbering starts with the sacle sequence. The amino 
acid sequence coded for by each gene is shown above (sac7d) or 
below {sac7e) each nucleotide sequence. Putative promoter (A- and 
B-boxes) and termination elements are underlined in the 5' and 3' 
honcoding regions of each sequence. Amino acid and nucleotide 
differences in the coding region of each gene are also indicated by 
underlines. The G at the start of transcription (in the B-box) for 
each gene is indicated with an asterisk- - - ^ . i- - 

strains was correlated with a lack of Sac7d protein ac- 
cumulation. In contrast to HMS174 strains, BL21 and its 
derivatives lack the ompl outer membrane protease and are 
deficient in the lonA protease (Studier et al., 1990). The 
ompl protease has been shown to be responsible for T7 RNA 
polymerase degradation during protein purification from £. 
coli (Grodberg & Dunn, 1988). Thus, it appears that in the - 
absence of stringent regulation of T7 RNA polymerase 
synthesis prior to induction with IPTG, or proteolytic 
degradation of the Sac7d protein, the protein accumulates 
to lethal levels. However, because significant amounts of 
the Sac7d protein do not accumulate in HMS174 strains, we 
have utilized BL21 (DE3) pLysS for subsequent expression 
and purification of the protein. 

Spectroscopic and Chemical Characterization. The UV 
spectra of native and recombinant Sac7 proteins were 
essentially identical, as expected, given the presence of a 
single tryptophan and two tyrosines and two phenylalanines 
in all proteins. The calculated extinction coefficient based 
on amino acid composition is 1.05 mL/(mgxnn) at 280 nm, 
in good agreement with the value of 1.03 ml7(mg-cra) 
determined .by ninhydrin analysis. The extinction : coef- 
ficients were also determined by using the ratio of absorbance 
at. 280 and .205 nm (see Materials-and Methods). -The 




FIGURE 4: Determination of the in vivo start of transcription \ 
the sac7d and sac7e genes by primer extension analysis, sacld (te 
d) and sac7e Qane e) specific oligonucleotides D and E, respectivt 
[which are complementary to residues 398-418 (Figure 3)], w. 
used to prime the synthesis of a. complementary strand of DI 
from total S. acidocaldarius RNA. These same oligonucleotit 
were also primers in the dideoxy sequencing reactions used 
markers for the sdc7d (pBSKS+fsacld) and sac7e genes (pUC 
sac7e) indicated. The sequences written on the left and right 
complementary to* the ones observed in the autoradiogram in 
marked region. The start of transcription is indicated in e; 
sequence by an asterisk. The first five coded amino acids of e 
protein are also indicated along side each complementary stn 
sequence. .... ■ 

empirical nature of this method might lead to some quest 
of its accuracy, but the high correlation of the results fr 
the six standards is extraordinary (r = 0.999), and 
reproducibility of the A2&Q/A205 ratio measurement is h 
leading to an expected error of 0.6%. The ratio met] 
demonstrates that the extinction coefficients of the nal 
and recombinant protein are identical, viz :> the mean of 
extinction coefficient measurements (native and recombir 
combined) using this method was 1.18 mLV(mg^rn) wii 
standard deviation of 0.008 mL/(mg-cm). The final exti 
tion coefficient for both the recombinant and native proti 
is taken to be 1.09 mL/(mg*crn), the mean of the tl 
independent measurements, with a standard error of ±< 
(calculated by propagating the errors of the three meas 
ments). . The extinction coefficient was shown to be 
independent from 2 to 10 r _ :; . ^ " ._.—*. - " 
The fluorescence excitation and emission spectra of 
native Sac7 arid recombinant Sac7d proteins were 
essentially identical (data not .shown). In addition, 
fluorescence emission spectrum was essentially that expe 
c /V\ 
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Figure 5: Southern analysis of Sulfolobus genomic DNA. (A) Autoradiogram of a Southern blot of HindUl digests of genomic DNA from 
5. acidocaldarius (RGJM) (lane 1), 5. acidocaldarius (DG6) (lane 2), 5. acidocaldarius (DSM639) (lane 3), S. solfataricus (DSM5354) 
(lane 4), and S. solfataricus (P2) (lane 5) probed with oligonucleotide F. The approximate sizes of the restriction fragments hybridizing to 
oligonucleotide F are indicated. (B) Autoradiogram of a Southern blot of EcoKl flane E), HindUl (lane H), and Pst\ (lane P) digested S. 
acidocaldarius RGJM genomic DNA hybridized with oligonucleotide F. Two closely spaced bands in lane P are clearly evident in the 
original autoradiogram. Lane E* is a second independent EcoRl experiment to clearly demonstrate the 0.8 kb fragment. (C) Similar to 
panel B except that the DNA was probed with oligonucleotide H. 
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Figure 6: Circular dichroism spectra of native Sac7 (solid line, 
0.26 mg/mL) and recombinant Sac7d (dashed line, 0.66 mg/mL) 
proteins in 0.01 M KH 2 P0 4 , pH 7.0. 

for a free tryptophan, indicating that the single tryptophan 
is highly solvent exposed in both proteins. Notably, the 
fluorescence emission spectra show a small shift upon 
DNA binding (data not shown), indicating that the exposure 
of the tryptophan changes slightly upon DNA binding. The 
CD spectra of native Sac7 -and recombinant Sac7d proteins 
were also essentially identical . (Figure 6). The variable 
selection method of Johnson (Manavalan & Johnson, 1987) 
indicates that both the native and recombinant Sac7 proteins 
are composed of 31% helix (both a- and 3i 0 -helix), 22- 
25%/?-sheet, 0-2% turn, and 42-45% nonrepetitive struc- 
ture. , ' 

The DQF-COSY spectra of the native and recombinant 
Sac7 proteins are remarkably similar (Figure 7). The native 
spectrum shows some additional correlation peaks, most 
likely due to the presence of 7a, b, c, d, and e isoforms in 
the native preparation and posttranslational modifications 
(e.g., monomethylation of lysines) in Sulfolobus. The 
essential identity of the chemical shifts for the native and 
recombinant proteins indicates again that the recombinant 
and native proteins are folded similarly; The extensive 
number of alpha protons shifted downfield of lhe water line 



at 4.7 ppm indicates the presence of significant /?-sheet 
structure (Wishart et al., 1992). The wide chemical shift 
dispersion has permitted an essentially complete assignment 
of the proton resonances and determination of the solution , 
structure (Edmondson, Qiu, and Shriver, manuscript submit- 
ted). 

No phosphorylation or glycosylation of either the native 
or recombinant proteins could be detected. The recombinant - 
protein differs from the native by containing the initiator 
metruoriine. The recombinant protein also contains an 
additional C-terminal lysine which was not reported in the 
amino acid sequence (Kimura et al., 1984), although it 
remains to be determined if this is an error in the protein 
sequence or if the lysine is actually removed posttransla- 
tionally. ; . 1 /- - ' : 

DNA Binding. The binding of Sac7 proteins to 
associated with a significant quenching of the mtrinsic 
fluorescence of the single tryptophan (Trp23) in both the 
native and recombinant Sac7 proteins (Figure 8). Binding 
of poly[dGdC]-poly[dGdC] in 0.01 M KH 2 P6 4 at pH 7.0 
leads to a maximal fluorescence quenching of the native 
protein by 88% and the recombinant Sac7d protein by 87%. 
Poly[dAdT]-poly[dAdT] shows a maximal quenching of 84% 
for both proteins (data not shown). The binding data can . 
be fit using the McGhee and von Hippel model (McGhee 
and von Hippel, 1974) without* cooperative interactions 
assuming a linear relationship between fractional quenching 
and protein binding. The poly[dGdC]*poly[dGdC] data can 
be fit with an intrinsic association constant of 2 x 10 7 M~ J 
for both native and recombinant Sac7d protein and site sizes 
of 7 bases (3.5 base pairs) and 6.8 bases for native and 
recombinant protein, respectively. Poly[dAdT]T>oly[dAdT] 
appears to bind slightly weaker with an association constant 
of 1 x lO 7 M" 1 for both proteins and site sizes of 1J5 bases 
for native protein and 6.8 bases for recombinant protein. 

The binding of Sac7 to poly[a^dTlT>oly[dAaT] signifi- 
cantly stabilizes the DNA double helix against thermal 
denaturation. The UV melting curve of rx>ly[dAdT]-poly- 
[dAdT] in 0.01 M KH 2 P0 4 is very sharp and has a T m of 
43.5 °C (Figure 9). In the presence of native Sac7d protein, 
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Figure 7: Double-quantum filtered (DQF-COSY) a to amide 
proton correlation spectra of the native Sac7 (A) and recombinant 
Sac7d (B) proteins at 35 °C in 90% H 2 O/10% D2O, pH 4.1. The 
protein concentrations in both spectra were approximately 10 mM. 

the melting profile of poly [dAdT]-poly[dAdT] broadens and 
the T m increases. At the highest protein concentration used 
in this series of experiments, the DNA melting temperature 
was increased about 33 °C above that of rx)ly[dAaT]-poly- 
[dAdT] alone. The recombinant protein increases the T m of 
rx)ly[dAdT>poly[dAdT] by a similar amount. However, the 
recombinant protein differs in that it aggregates as the double- 
stranded poly [d( AT)] melts. CD measurements of^the 
suspension, and the supernatant after allowing the aggregate 
to settle, indicate no major conformational changes during 
aggregation of the protein— DNA mixture. . . , ; ^ 
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Figure 8: Reverse titrations of the native Sac7 (solid circles) a 
recombinant Sac7d (open circles) proteins with po!y[dGdC]-po 
[dGdC] at pH 7.0 (0.01 M KH 2 P0 4 ), 25 °C with 6.6 fiM Sa 
proteins and 7.3 fxM Sac7d. The smooth curves through the d 
are overlays of simulations using a noncooperauve McGhee-v 
Hippel model (McGhee & von Hippel, 1974). For the native Sz 
proteins this corresponds to a site size of 7 bases (3.5 base paii 
maximal quenching of 88%, and an intrinsic association consu 
of 2 x 10 7 M" 1 . For the recombinant Sac7d protein this correspor 
to a site size of 6.8 bases (3.4 base pairs), maximal quenching 
87%r-and an association constant of 2 x 10 7 M~ ! . 




Temperature (°C) ,* 

Figure 9: Thermal denaturation of poly[dAdTl-poly[dAdT] moi 
tored by changes in UV absorbance at 262 nm in 0.01 M KH 2 K 
pH 7.0. The melting of poly[dAaTfr)oly[dAdT] is shown alo 
(open triangles), with native Sac7 proteins (solid circles), and wi 
recombinant Sac7d (open circles). The .concentration of pol 
[dAdT]-poly[dAdT was 70 (nucleotides), and the concentrati- 
of protein was 350 //M " ^-*'' 

Thermal Stability, Sac7 proteins are highly thermostabi 
as expected from their origin. Native Sac7 and recombina 
Sac7d samples heated to 100 °C showed no precipitation 
cloudiness, although some increase in scattering was none 
able in the UV spectrum. The proteins unfold reversibly 
indicated by the observation of similar endolherms wi 
repetitive DSC scans up to 100 °C ^ 

The native Sac7 proteins show a DSC endotherm at p 
6.0 (0.01 M KH 2 P0 4 , 0.1 M KC1, 0.001 M EDTA) with 
T m of 99.0-100.2 °C (data not shpwn). By comparison, t] 
native Sso7 protein has a T ro of .99.4 °£L under simil 
conditions (data not shown). A precise midpoint for ll 
unfolding transition is difficult to define since data abo 1 
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Figure 10: Differential scanning calorimetry (DSC) of native Sac7 
(solid circles) and recombinant Sac7d (open circles) proteins at pH 
4.0 (0.3 M KC1, 0.05 M potassium acetate). Protein concentrations 
were 1.5 mg/mL of native Sac7 proteins and 1.38 mg/mL of 
recombinant Sac7d. Smooth curves through the data are nonlinear 
kast-squares fits with T m ~ 80.3 °C, AH^ = 53.0 kcal/mol, A// Vh 
= 49.6 kcal/mol, for the recombinant protein; and T m = 86.8 °C t 
Atfoi — 56.4 kcal/mol, A/7 vh = 60.3 kcal/mol for the native protein. 

V . 
100 °C cannot be collected in water in the MC2 calorimeter. 

Notably, the unfolding of the native Sac7 proteins is 

remarkably reversible, as indicated by essentially 100% 

reproducibility of successive scans on the same sample 

following cooling. The recombinant Sac7d protein unfolds 

atpH 6.0 (0.01 M KH2PO4, 6.1 M KC1, OXXH M EDTA) 

with a T m of 92.7 °C, or approximately 7 °C less than the 

native. 

A reliable analysis of the DSC endotherms requires a more 
complete delineation of the endotherm which can be obtained 
by lowering the pH and increasing the salt concentration to 
shift the endotherms to lower temperature. At pH 4.0 (0.05 
M potassium acetate, 0.3 M KC1) the native protein unfolds 
with a T m of 86.8 °C (Figure 10). The endotherm can be fit 
with a van't Hoff enthalpy of 60.3 kcal/mol and a calori- 
metric enthalpy of 56.4 kcal/mol, i.e., a AHaa/AHyt of 0.94, 
indicating that the native protein exists as a monomer under 
these conditions and unfolds in an all-or-none fashion with 
no significant, populated intermediates. . 

The recombinant Sac7d protein similarly unfolds reversibly 
at pH 4.0 (0.05 M potassium acetate, 03 M KC1) but with 
a midpoint temperature^ of 80.3 °C (Figure 10), or 6.5 °C 
less than the native protein. It unfolds with a vaxft Hoff 
enthalpy of 49.6 kcal/mol, and a calorimetric enthalpy of 
53.0 kcal/mol, i.e., a AT/oi/AZ/vh of L07. The identity, within 
experimental error, of the calorimetric and yan't Hoff 
enthalpies indicates that the recombinant protein also exists 
as a monomer under these conditions and unfolds via a two- 
state reaction. ... . ; . ; . ; 

DISCUSSION : v :n " . : / 

We report here the cloning and sequencing of two genes 
from S. acidocaldarius coding foi\ Sac7 proteins which 
correspond to Sac7d and Sac7e. The sac7d and sac7e genes 
differ at only 16 positions, within the coding region (under- y 
foed in Figure 3); three of these differences are transversions/ 
*hile the rest are transitions. The. sac7d and sac7e genes / 
co ^ c for 66 and 65 amino acid proteins, respectively. The 
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deduced amine id sequences are in complete agreement 
with the published sequences for both proteins (Kimura et 
al., 1984; Choli el al., 1988a) with the exception of initiator 
methionines at the amino termini and an additional lysine 
(Lys66) at the carboxy terminus of the Sac7d protein in the 
deduced sequence. The additional lysine can be explained 
either by a failure to discern the final lysine in the amino 
acid sequencing of the Sac7d or by posttranslational carboxy- 
terminal processing to produce the mature protein. It should 
be noted that Sac7d, Sac7e, and Sso7d all terminate with at 
least two lysine residues (Figure 1). 

The data presented here indicate that there are only two 
Sac7 protein genes in 5. acidocaldarius. Genes coding for 
Sac7 proteins other than Sac7d and e could not be detected. 
The failure to detect genes for the Sac7a and b proteins and 
the fact that the proteins appear to be simply truncated at 
the carboxy termini to various extents suggest that Sac7a 
and b result from either posttranslational modification at the 
carboxy terminus or by proteolysis during protein isolation 
and purification. . .. r 

Promoter elements consistent with the archaeal "A-box" 
and "B-box" consensus sequences have been located up- 
stream of the sac7d and sac7e protein coding sequences. The 
agreement of the "A-box" sequence of sac7d with the 
consensus "A-box" sequence is greater than that for the 
sac7e. This difference between the "A-box" of the promoter 
elements in the two genes may explain the higher levels of 
Sac7d relative to Sac7e in vivo (Grote et al., 1986). 

There is significant sequence similarity in the regions of 
sac7d and sac7e extending from the 5' end of the "A box" 
to the initiation codon when the corresponding "A-" and "B-" 
boxes are aligned. The two sequences also have similarly 
placed pyrimidine rich regions downstream of their termina- 
tion codons. These regions show similarity to the transcrip- 
tion termination signals described for the Sulfolobus virus- 
like particle, SSV 1 , where transcription termination has been 
shown to occur within pyrimidine-rich regions directly 3' 
of the consensus 11 11 1 iT [reviewed in Brown et al.. , 
(1989)]. Northern analysis of S. acidocaldarius RGJM RNA' 
probed with an oligonucleotide (oligonucleotide F t Table 1) 
complementary to the common sequence at residues 305— 
324 of the two sac7 genes (Figure 3) showed hybridization 
to a single size of transcripts (Shao and Gupta, unpublished 
results), indicating that both transcripts terminate in similarly 
placed regions. Thus, it is likely that the conserved oligo- 
pyrimidine sequences of the two genes contain the transcrip- 
tion termination signals. . 

Although the regions associated with transcription termi- . 
nation are highly homologous, the sequences between these 
regions and the termination codons are significantly different 
in the sac7d and sac7e genes. Similarly, though the regions 
encompassing the putative core promoter elements in the two 
genes ("A-" and "B-" boxes) share extensive homology, the 
sequences 5' of the "A-box" show less similarity. It would 
appear that sufficient time has elapsed since the supposed 
original gene duplication for the two sequences to diverge. 
The conservation of cis-regulatory . elements along with 
coding regions in the two genes indicates that there is & 
selective pressure to maintain not only the expression of both 
gene products but also a large part of their sequence. It is 
not clear if there is more than r>ne form of the Sso7 proteins. 

A typical ribosome binding site sequence upstream of 
initiator ATG is not observed in either of the two sac7 genes 
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Figure 11:- Potential secondary structures for the 5'-terminal 
regions of the sac7 RNA transcripts determined using Mulfold 
(Jaeger et al., 1989a,b; Zuker, 1989). Initiator codons are shown 
in lower case. Putative ribosome binding sequences GGUGA and 
AGGU are indicated in bold and underlined formats, respectively. 
Note that the AGGU sequences within the two transcripts are * 
located at different positions. - 

(Figure 3). This is not unusual, since many other Sulfolobus 
genes also lack these sites (Amils et al., 1993; Dalgaard & 
Garrett, 1993). However, potential ribosome binding sites 
are observed downstream of the initiator codons of the two 
sac? genes which have precedents in other archaea. The 
ribosome binding sites in certain halobacterial genes, which 
have very short or no 5' untranslated regions, occur within 
loops of potential hairpin structures in the 5' regions of the 
transcripts (Brown et al., 1989; Amils et al., 1993)/ The 
hairpin arrangement probably exposes these sites for inter- 
action with 16S rRNA. We note that the 5' regions of the 
two sac7 transcripts can be folded into secondary structures 
as shown in Figure 1 1 . The sequence UCACCU near the 3' 
end of 16S rRNA of Sulfolobus (Woese et al., 1984; Olsen 
et al., 1985) potentially can either form five base pairs with 
GGUGA within codons 1-3 or form four base pairs with 
AGGU within codons 3-4 of the sac7d transcript. Corre- 
sponding sequences in the sac7e transcript are GGCAA and 
AAGU, respectively, which cannot form similar pairs with 
the 16S rRNA. However, further downstream in the sacle 
transcript, there is AGGU within codons 5-6, which can 
form four base pairs with the same UCACCU sequence of 
the 16S rRNA; the corresponding site in sac7d is less 
efficient AAGU. Parts of these potential ribosome binding 
sites do occur within single-stranded regions (Figure 1 1), as 
are the cases for the above mentioned halobacterial genes. 
The differences between the sequences and locations of the 
potential ribosome binding sites of the two sac7 transcripts, 
along with the previously mentioned differences in the **A- 
box" sequences, may also explain the higher synthesis of 
Sac7d protein. \ 

Kimura et al. (1984) have previously noted that the 
clustering of lysines in the amino terminus of these proteins 
is reminiscent of that observed in eukaryotic HMG proteins. 
Choli et al. (1988b) have also pointed out a slight sequence 
similarity with E2A DNA-binding protein from adenovirus. 
An extensive search of the currently available sequence 
databases showed no significant homologies between the 
Sac7d protein and any known chromatin or DNA-binding 
protein. A BLAST search using the Sac7d sequence picked 
up a 100% homology with the amino-tenninal sequence (only 
12 arnmo^terminal residues are known) of a small protein 
(accession number S21168) from S. solfataricus which 
apparently catalyzes disulfide bond formation (Guagliardi 
et al., 1992). This report should be viewed with caution due 
to the loss of activity upon cation exchange chromatography 
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of th* Mein. BLAST also picked up a high homolo: 
a reported p2 ribonuclease (Fusi et al M 1993) froi 
solfataricus with a sequence identical to the Sso7d pr 
(Choli et al., 1988a). RNase activity for the 7 kDa pro 
is surprising and remains to be confirmed. Prelimi 
experiments indicate that the recombinant Sac7d protein 
not have RNase activity (Edmondson and Shriver, un 
lished results). The BLAST search also picked up < 
weak homology with the 30S ribosomal protein S5 fro 
coli (P02356) and heat shock protein X16 from the Ah 
clawed frog (A22175). A FASTA search using the S 
sequence revealed some homology with elongation f; 
l-<5 (P29692), 30S ribosomal protein S8 (P24353), and B 
directed RNA polymerase subunit A' (P3 1 8 1 3). A PR(X< 
search using the Sac7d sequence revealed phosphocre; 
kinase phosphorylation sites at residues 17-19 (TSK), 
42 (TGR), and 46-48 (SEK), and creatine kinas 
phosphorylation sites at 33-36 (TYDD), and 46-49 (SB 
A BLOCKS analysis provided a single meaningful m 
with ribosomal S5 protein. 

We have expressed the sac7d gene in the tightly contn 
BL21(DE3)pLysS E. coli expression system develops 
Srudier et al. (1990) using the pET series of plasn 
Accumulation of the sac7d gene product appears to be 1< 
in E. coli. This is indicated perhaps most clearly by 
inability to establish the pET-3b/sac7d construct 
Bis31(DE3). The additional regulation provided by th< 
lysozyme inhibition of T7 polymerase appears to be requi 
The purified, recombinant protein can be isolated > 
reasonable yield, e.g., typically, about 1 mg of protein p 
of wet weight E. coli cells is obtained, or approximately t\ 
that obtained for the native protein from 5. acidocaldar 
We have been unsuccessful in expressing the sac7e g< 
possibly due to its usage of codons rare in E. coli. 

The recombinant Sac7d protein appears to be essenti 
identical to the native Sac7 proteins in all respects ex< 
for stability. -The UV spectral extinction coefficients 
identical, as are the fluorescence excitation and emis: 
spectra. This is perhaps not surprising given that both 
largely due to a single tryptophan on the surface of 
protein (Edmondson, Qiu, and Shriver, manuscript submit 
[see also Baumann et al. (1994) for the structure of Sso' 
although the two tyrosines should be sensitive to differer 
in structure. CD spectra are more sensitive to differei. 
in secondary structure content, and the spectra of the \ 
proteins are essentially identical, again indicating sim 
structures for native and recombinant protein. 

Analyses of the CD spectra using the variable seleci 
method of Johnson (Manavalan & Johnson, 1987) indie 
that Sac7d consists of 31 % helix and 22-25% £-sheet: 1 
differs from the 52% a-helix, 12% /?-sheet predicted 
sequence analysis algorithms in this work and the 1 
a-helix, 15% 0-sheet predicted by Choli et al. (1988a) u< 
the average of four different prediction methods. All of th 
methods significantly underestimate the amount of /?-sl 
in Sac7d (42%) as determined from the NMR solut 
structure (Edmondson, Qiu, and Shriver, manuscript subi 
ted) [see also Baumann et al. (4994)]. However, the heli 
content determined by CD (31%) is close to that of the.N) 
solution structure (22% a-helix, 1 1 % 3io-heiix)i An anal} 
of the CD spectrum of Sac7e (Dijk & Reinhardt, 1986) us 
the PG method (Provencher & Glockner, *98 1 ) gave a mi 
better estimate of ^5-sheet content (44%) but underestima 
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the helical content (15%). The CD spectrum reported for 
Sac7e (Dijk & Reinhardt, 1986) differs quantitatively from 
that of native Sac7 and recombinant Sac7d presented here. 
Further, the inability of the CD analyses to accurately 
estimate the secondary structure content suggests that at least 
part of the secondary structure contributions to the CD 
spectra of the Sac7 proteins are not well represented in these 
sets of reference proteins. , ... . . 

A more detailed, atomic level comparison of the structures 
of the recombinant and native proteins can be obtained from 
NMR. The "fingerprint" region of double-quantum filtered 
COSY spectra of proteins shows the chemical shift correla- 
tions of alpha and NH protons and is exquisitely sensitive 
to the structure of the protein [see, for example, Wishart et 
al. (1992)]. This permits a qualitative comparison of the 
structure of the backbone of the two proteins which is more 
detailed than that provided by optical spectra comparisons. 
The fingerprint regions of native and recombinant Sac7d 
protein are remarkably similar, indicating that the two 
proteins have very similar backbone folding patterns. 

The binding of the Sac7 proteins to double stranded DNA 
leads to a dramatic decrease in intrinsic tryptophan fluores- 
cence. The large signal allows for essentially noise-free 
titrations and accurate comparisons of , the native and 
iccombinant protein binding function. The data presented 
here indicate an affinity of 2 x 10 7 M" 1 and site size of 3.5 
base pairs for poly[dGdC]-poly[dGdC]. The agreement of 
quantitative binding parameters obtained for the native and 
recombinant proteins is additional evidence for essentially 
identical global folds for the two proteins. These binding 
studies are the fust quantitative analysis of the binding of 
the Sac7 proteins to DNA. 

Various prior studies of the 7 kDa DNA-binding proteins 
from Sulfolobus have characterized the binding to nucleic 
acids in a qualitative manner. Electron micrographs of the 
7 kDa proteins from S. acidocaldarius complexed with DNA 
indicated mat the helix becomes increasingly compacted with 
increasing ratios of protein to DNA (Dijk & Reinhardt, 1986; 
Lurz et al., 1986). Filter binding studies confirmed that the 
7 kDa proteins had an affinity for pBR322 DNA even at 
relatively high salt concentrations (e.g., 0.265 M NaCl) which 
was comparable to that observed for E. coli HU protein 
(Grote et al., 1986; Choli et aL, 1988a). Characterization 
of the affinity for DNA in this work was in terms of percent 
bound at a specific ratio of protein to DNA. DNA-melting 
studies have also been performed on a small DNA-binding 
protein from S. acidocaldarius, HSNP-C\ with an amino acid 
composition similar to the Sac7e protein, although the 
sequence has not been presented The protein increases the 
To of double-stranded DNA (Reddy & Suryanarayana, 1989). 
In addition, this protein demonstrated a significant quenching 
of its intrinsic tryptophan fluorescence upon DNA binding, 
although no quantitative analysis of the titrations was 
performed. 

Baumann et al. (1994) have recently presented sofne 
fluorescence binding data for the homologous Sso7 proteins 
from 5. solfataricus. A quantitative analysis of the titrations 
*as not performed, but a visual inspection of the data 
indicates a binding site size for double-stranded DNA of six 
base pairs in low salt (0.02 M Tris, pH 7.4), nearly twice 
that presented here. Assuming a site size of 3—6 base pairs, 
the binding affinity in low salt is approximately 0.5 to 1 x 
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10 6 NT 1 . The thermal stability of poly[dldC]-poly[dldC] was 
increased by approximately 40 °C in 5 mM Tris (pH 7.0). 

The unfolding of both the native and recombinant proteins 
is reversible, allowing for detailed, accurate characterization 
of the thermodynamics of folding. In contrast to all other 
physical parameters studied here, the energetics of folding 
of the recombinant Sac7d protein differs significantly from 
that of the native Sac7 proteins. The native protein unfolds 
at pH 6.0 at 100 °C, remarkable given the absence of any 
metal cofactors or disulfides. Surprisingly, the recombinant 
protein unfolds with a T m 6.5 °C less than the native. The 
lower enthalpy of unfolding of the recombinant protein is 
not surprising and most likely results from a positive heat 
capacity change associated with unfolding. Any shift to 
lower temperature of an endotherm associated with a positive 
AC P will lead to a decrease in enthalpy since : - 

It is generally thought that a positive AC P of unfolding is 
due to the exposure of internal hydrophobic residues (Stur- 
tevant, 1977; Privalov & Gill, 1988). The magnitude of the , 
change observed here is consistent with that observed for 
other globular proteins (Privalov & Gill, 1988). 

Maras et al. (1992) have previously noted that specific 
lysine monomethylation of glutamate dehydrogenase from 
S. solfataricus might be responsible for enhanced thermal 
stability of this enzyme relative to homologous mesophile 
forms. Baumann et al. (1994) have presented* mass spec- 
troscopic evidence correlating methylation of the Sso7 * 
protein with growth temperature, and they have suggested 
that such a modification might be related to the stability of 
the protein. The most straightforward way to determine if 
methylation increases the thermostability of the protein would 
be to compare the stabilities of the protein in its methylated 
and unmethylated forms." Demethylation of the native protein 
is not a trivial control experiment given the lack of 
commercially available demethylases and most importantly/' 
the specificity of reported demethylases (Paik & Kim, 1980). 
In the absence of a demethylase, the preparation of/an 
unmethylated form is best accomplished using recombinant 
protein. We have demonstrated here a significant difference 
in the thermostability of native and recombinant Sac7 protein. 
The only known difference between these proteins is the 
e-aminomonomethylation of lysines 5 and 7 in the native 
protein and the initiating methionine in the recombinant 
protein. The lack of Lys66 in the reported amino acid 
sequence of the native protein is presumably a sequencing 
error, and this will be investigated in the NMR analysis of 
the native protein. No other posttranslational modification, 
such as phosphorylation or glycosylation, of the native or 
recombinant Sac7 proteins was detectable. The current 
evidence, therefore, strongly indicates that Sulfolobus can 
increase the thermostability of some of its proteins by specific 
lysine monomethylation.. „ . • 

We note that the level of specific methylation of Sac7 is 
variable and incomplete, i.e., the native preparation is 
heterogeneous (Kimura et al., 1984; Choli et al., 1988a,b). 
y Choli et al. f <1988b) report that the degree of monomethyl- 
ation of lysine 4 is 70%, 25%, and 20% in native Sac7a, 
Sac7b, and Sac7d, respectively; while that for lysine 6 is 
50%, 40%, and 50%, respectively. Heterogeneity would be 
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expected to lead to broaoening of the endotherm, rather than 
narrowing (see Figure 10). It would appear, therefore, that 
stabilization might not require complete methylation of the 
specific lysines. . 

Interestingly, we have been unable to increase the stability 
of the recombinant Sac7d protein by nonspecific, reductive 
methylation (McCrary and Shriver, unpublished results), a 
process which leads to predominantly dimethylation (Means 
&Feeney, 1971). Monomethylation changes the pK a of the 
oamino group from 9.25 to 10.63, while dimethylation has 
little further effect giving a pK> of 10.78 (Paik & Kim, 1980). 
Trimethylation returns the pK B to 9.8. Given the small 
change in p/T a and the fact the difference is observed even 
at pH 4.0, it is doubtful that an effect of monomethylation 
on stability might be electrostatic in origin. A structural 
explanation of the difference in stability must await a more 
detailed comparison of the structures of the native and 
recombinant proteins. The spectroscopic data presented here 
would indicate that the structural differences are slight. 
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3. I have read and am familiar with the contents of the application. As I understand 
the outstanding rejection, the Examiner believes that the pending claims are overly broad. 
Specifically, the Examiner contends that it would take undue experimentation to identify 
members of the genus of sequence non-specific double-stranded nucleic acid binding proteins 
that have at least 75% identity or at least 85% identity to Sso7d (or Sac7d), or at least 90% 
identity to Sac7d, and that can enhance processivity of a polymerase. The Examiner has now 
cited three publications that allegedly further support the rejection. In the Office Action, the 
references are characterized as showing that a single point mutation in Sso7d can affect the 
function of the nucleic acid binding domain. 

4. It is the intent of this declaration to illustrate how the experiments performed in 
the cited publications in fact support the position that one of skill can successfully employ the 
rich structural Sso7d/Sac7d data available in the art to predict the effects of sequence changes on 
Sso7d/Sac7d function. In the cited references the authors were seeking to investigate Sso7d by 
introducing mutations that were predicted, based on the structure, to negatively affect function. 
Their results validated this approach. In the current invention, the skilled artisan can use this 
same structural information to reasonably predict sequence changes that preserve S$o7d/Sac7d 
function rather than destroy it. Each of the references is individually discussed below, 

5. Wang, et al Nucl Acids Res. 32:1 197-1207, 2004 ("Wanfl") 

The Examiner points to Wang as further supporting the rejection because Wang 
teaches that a change in Trp24 of Sso7d significantly reduces the effectiveness of the protein ifl 
enhancing processivity. Wang is a post-filing publication of my work relating to polymerases 
that are modified by linkage to an Sso7d protein. In one aspect of the expejim^ts presented in 
this article, we determined that Sso7d double-stranded ON A (dsDNA) bidding activity is 
important for processivity, as taught in the current application. The interactions between Sso7d 
and dsDNA have been extensively studied. Trp 24 was identified in structural studies, to be 
important for binding to dsDNA, as 9*plwied on page 120) , column 1 in the last p^gi^ph, 
(Trp24 in Wang corresponds to Trp23 r . SEQ ID NO:2 of the application as file^) Thf 
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referenced structural studies (Gao et al, Nature Struct. Biol. 5:782-786, 1998; and Catanzano, et 
al. Biochemistry 37: 10493-10498, 1998) were readily available in the art belbre our invention. 
We purposefully selected Trp 24 for mutation to further investigate the correlation between DN A 
binding and processivity. We created three mutant Sso7d-polymerase fusion proteins in which 
Trp 24 was replaced with Val, Gly or Glu, with the intent of reducing the ability of Sso7d to bind 
dsDNA and in turn, reduce its ability to enhance the processivity of the DNA polymerase. All 
three mutant fusion proteins exhibited decreased processivity relative to that of the wildtype 
Sso7d-polymerase fusion (see, the first column of page 1 201 bridging to the second column), just 
as we expected. Substitution of Trp 24 with Glu, which we expected to exhibit the greatest effect 
because it differs the most from the wild-type residue, also resulted in the greatest decrease in 
processivity. I note that all three mutant Sso7d proteins still retained some ability to enhance 
processivity when compared to the unmodified polymerase (Table 2, page 1 202). 

This experiment shows how one of skill in the art makes use of structural 
information to recognize amino acid residues that are expected to be relevant to function. In our 
case, we intentionally selected a residue based on available Sso7d structural data with the 
expectation that we would compromise the function of Sso7d in enhancing polymerase 
processivity. This is precisely what we observed. As noted above, the same structural 
information can be used to select residues that would not be expected to alter Sso7d activity, 
which would be the goal in designing Sso7d variant proteins for use in the invention. 

6 - CoastmmetaL. fl/ogfo?»» c/rv 38:12709-12717. 1999 ("Consonni"^ 

Consonni is cited by the Examiner as providing evidence that the claims are not 
enabled because a single amino acid change (Trp 23 or Phe 31) in Sso7d can alter function. 
However, Consonni also provides another example of how structural information is predictive of 
the functional importance of particular amino acid residues. This paper describes the solution 
structure of an SspTd jnutant protein F31 A, in which an alanine is substituted for a phenylalanine 
residue at poAfa 3). In prior studies cited in Consonni at page 12710 in the second full 
paragraph of tho first column, phe 3 1 was selected for mutation on the hasis of structural data 
that indicated that this residue \s located at the core of the aromatic cluster and has tight contact 
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with side chains of several residues in the cluster. This residue was therefore predicted to be 
important for stability. I note that this residue is also highly conserved in Sso7 family members, 
as can be seen in a sequence comparison of Sso7d, Sac7d ? Sac7a, and Sac7e (see, the Rule 1 .132 
Declaration by Peter Vander Horn of record in this application) As the authors expected, the 
mutation of Phe 31 to Ala led to a loss in thermo and piesostabilities (third paragraph of column 
1 , page 12710). The analysis presented in the current Consonni paper relates to the solution 
structure of the F31A mutation, which was performed in order to determine the structural 
changes that were associated with the loss of stability of the mutant protein. 

Consonni observed that in the solution structure of the F31A mutant, the Trp 23 
residue was reoriented such that it pointed inside the aromatic cluster. Given the previously 
identified role of Trp23 in contacting DNA (Trp 23 is the same residue as Trp24 in Wang), the 
authors investigated the DNA-binding activity of the mutant F3 1 A protein. The results showed 
that the binding activity was also impaired, once more highlighting that Trp 23 plays an 
important role in DNA binding, as indicated by the structure. 

Consonni again provides an example of how a practitioner in this art makes use of 
structural information. With regard to the loss of stability observed in the F31A mutant protein, 
it is not surprising that the mutation affected Sso7d stability. It is well known in the field that an 
amino acid with a large, buried hydrophobic side chain stabilizes conformation, it is predictable 
that changing the large hydrophobic side chain to a small side chain would result in a loss of 
stability. It is therefore standard practice in this art to avoid radically mutating such residues, if it 
is desired to preserve function, just as it would be desirable to avoid mutating those residues that 
directly contact DNA to preserve DNA binding function. 

7. Shehh etal Biochemistry 42:8362-8368. 2003 ("Shehi") 

Shehi performed studies examining the thermal stability and DNA binding 
activity of Sso7d. 1 will first review the results of these analyses and then ^ddrps the specific 
issues raised by the Examiner, 
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Shehi structural analysis 

Shehi investigated the function of the CMerminus of Sso7d. The authors explain 
that structural properties of Sso7d were known and provide a brief description of the topology of 
Sso7d on page 8362. Shehi created an Sso7d protein that was truncated at Leu54 (L54A) in 
order to investigate the role of the C-terminal a-helix on stability and DNA binding activity. 
The region targeted in Shehi does not contact the DNA in the structural analysis of Sso7d and 
Sac7d DNA binding interactions. To determine whether deletion of the C-tcrminal region had 
effects on DNA binding, the authors analyzed the binding of L54A to double-stranded calf 
thymus DNA in comparison to the binding activity of wildtype Sso7d. It was found that the 
association constant for binding of L54A to double stranded DNA was similar to that of Sso7d 
(page 8362 bridging to page 8363 and Figure 4). Thus, deletion of the eight residues at the C- 
terminus of Sso7d did not result in loss of DNA binding activity, which was predictable based on 
the structure. 

The authors also observed that a variant that was truncated at Glu 53 could not be 
isolated under the same conditions thai allowed them to isolate L54A and noted that this 
highlights the role that Leu 54 plays in the folding process, Shehi explains that Baumann and 
colleagues (Nat. Struc. BioL 1 :808--809, 1994)) in fact described that the side chain of Leu54 is 
packed well against that of Ala50 f anchoring the C-terminal end of the chain to the protein core. 
Other investigators also confirmed that Leu54 is involved in strong van der Waals interactions 
with the remaining part of the protein. Thus, the available Sso7d/Sac7d structural data provided 
infprmation on the role of Leu 54 that was bom out by the studies in Shehi. 

Shehi's results are consistent with the analysis of Sso7d structure provided by Dr. 
Vander Horn in his Declaration that is of record in this application. Dr. Vander Horn indicated 
that in the context of DNA binding activity, the alpha helix is highly mutable, as evidenced by 
the fact that natural variation of Sso7 homologs is observed in this domain* Dr, Vander Horn 
cautioned, however, that the naturally occurring mutations in this domain appear to preserve the 
alpha helix. Thus, in designing Sso7d variants for use in the invention, opp.pf skill \Ypuld 
introduce mutations that preserved structure. 1 farther note that the L54 r^due is also conserved 
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across the naturally occurring Sso7 proteins, which also would be an additional consideration in 
designing variants with the purpose of retaining DNA binding activity. 

Examiner's rejection 

Shehi mentions that there were difficulties in isolating the deletion in which the 
C-terminus was truncated at Glu53 under the same conditions that were used to isolate L54A. 
Shehi also noted that L54A has a limited solubility in aqueous solution, The Examiner contends 
that "both mutations demonstrate the unpredictability of the effect of point mutations in Sso7d on 
any particular function or attribute of Sso7d. ,! However, one of skill cannot conclude from the 
experiments in Shehi that the effects of point mutations at Glu53 or L54 would be unpredictable. 
Shehi investigated deletion mutations,, not point mutations. The effects observed in deleting 
most of the C-terminal oc-helix therefore cannot be extrapolated to the effects of introducing 
point mutations into that region. Again, in generating functional variants, one of skill would 
employ structural information that is available and information about the conservation of 
sequences/structure among family members in considering potential changes to an amino acid 
sequence. 

In terms of the limited solubility of L54A, the authors believe that this is likely 
due to the loss of three net charges and the exposure of hydrophobic moieties upon deleting the 
last eight residues. It is recognized in the art that changing the charge of a protein and exposing 
hydrophobic residues can influence solubility. The ordinary artisan can additionally consider 
such effects in designing variant Sso7d sequences. I further note that Shehi was examining L54A 
alone, not when fused to a polymerase protein. The limited solubility observed by Shehi under 
these conditions would not necessarily reflect the solubility when the protein is fused to a 
polymerase, 

8. In summary, the references cited by the Examiner repeatedly demonstrate that the 
structural information about Sso7d sequence and function provided a sound basis for accurate 
prediction of effects of Sso7d mutations on DNA binding function. In the current invention, in 
order to generate Sso7d variants that retain DNA binding function, and accordingly, the ability to 



Sd Wd££:t70 <!_002 S0 M"r 



3P9QT£L£\p: 'ON Xdd 



Gib: woaj 



USSN No, 09/870,353 Declaration of Dr, Yan Wang 

Wang Page 7 

enhance processivity, the practitioner would use the same structural information to avoid those 
residues that participate in DNA binding function and have the same reasonable expectation of 
succe^-2^ fl 
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EXPERIENCE: 

2003-present Principle Scientist/R&D Manager, Dept. of System Integration, GXD. Bio Rad 
Lab. (Employer change due to acquisition.) 

Lead the Reagent Development R&D group in developing amplification reagent and in 
enzyme improvement. 



2003-2004 Principle Scientist, Department of R&D, MJ Bioworks, Inc. 

Lead two R&D groups, Reagents Development and Instrument Analyses, of 10 people 
(scientists and research associates). Oversaw the planning and execution of a number of 
reagents development projects. Ensured support to the instrument development effort. 
Coordinated collaborations with external R&D groups. 



2001-2003 . Senior Scientist, Department of R&D, MJ Bioworks, Inc. 

Lead the Reagents Development R&D team of 5 people (scientist and research 
associates). Oversaw the development and successful launch of several commercial 
products. 

1998-2001 Associate Scientist/Research Scientist, Department of R&D, MJ Bioworks, Inc. 

Carried out research project developing new technologies that improve the in vitro 
performance of DNA polymerases. Independently conceived and validated the Sso7d 
technology as a novel strategy to improve the processivity of DNA polymerase. The 
development of this idea ensured a strong IP position of MJ Bioworks, changed the R&D 
direction of the company, and eventually enabled the launch of several commercial 
products in the subsequent years. 

1993-1998 Postdoctoral fellow, Christine Guthrie lab, Department of Biochmistry, UCSF 

Carried out research project investigating the structure/function relationship of an RNA 
dependent ATPase and putative RNA helicase, Prpl6, which is involved in pre-mRNA 
splicing in yeast. Have applied combined approach of Genetics, molecular biology and 
biochemistry in achieving the research objective. 

1986-1992 Graduate student, Peter von Hippel lab, Institute of Molecular Biology, Univ. of Oregon. 

Completed a doctoral thesis project that studies the mechanism of rho-dependent 
transcription termination using combined approaches of biochemical and biophysical 
analyses. Systematically studied the interactions between purified protein and a large 
number of oligomer RNAs with designed sequences to elucidate the effects of nucleotide 
sequence on protein-RNA interactions. 

EDUCATION: 

1986-1 992 Ph.D. in Biochemistry, University of Oregon. 

1982-1986 B.S. in chemistry, Beijing University, P.R.C. 



AWARDS: 

1996-1998 The American Cancer Society (ACS) Postdoctoral Fellowship. 

1 993-1 996 The Damon Runyon-Walter Wintchel Cancer Research Postdoctoral Fellowship. 

1 99 1 - 1 992 A Research Fellow of the Institute of Molecular Biology, University of Oregon. 
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